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directory cache entries than a total number of memory blocks in the 
memory module. The method includes the step of ascertaining whether the 
memory block is currently cached in the partial directory cache. If the 
memory block is currently cached in the partial directory cache, the 
first memory access request is serviced using a directorv protocol. If 
the memorv block is not currently cached in the partial directory cache, 
the first memory access request is serviced using a directory-less 
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units; with each page including two or more cache lines. Accordingly, 
during the execution of a program, cache-line-sized components of a 
page-sized block of data are incrementally stored in the cache lines of 



the LLCs. Subsequently, the system determines that it is time to review 
the allocation of cache resources, i.e., between the LLC and the HLC. The 
review trigger may be external to the processor, e.g., a timer 
interrupting the processor on a periodic basis. Alternatively, the review 
trigger may be from the LLC or the HLC, e.g., when the LLC is full, or 
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respective blocks of data and determining if the number of cached 
components identified with^ the blocks' exceed a threshold; 'If the 
threshold is exceeded for cached components associated with a particular 
block, space is allocated in the HLC for storing components from the 
block. This scheme advantageously increases the likelihood of future 
cache hits by optimally using the HLC to store blocks of memory with a 
substantial number of useful components. 
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ABSTRACT EP 817079 A3 

A flexible scheme is provided for designating the appropriate 
write-back protocol best sui'ted for each memory level within a 
multi-level-cache computer system. The skip-level memory hierarchy of the 
present invention includes a lower-level copy-back cache and a 
higher-level write-through cache. This greatly simplifies the 
implementation of the higher-level cache, since it may be implemented 
with a write-or-read access to its address tag. Although 
counterintuitive, a write- through higher-level cache in a distributed 
shared memory may also increase the efficiency of the computer system 
without unduly increasing the volume of network traffic within the 
computer system. This is because a write-through higher-level cache 
increases the probability of readily-available cached copies of updated 
data which are consistent with the home copies of the data, thereby 
reducing the number of fetches from remote home locations whenever the 
data is not found in the lower-level cache but is found in the 
higher-level cache. 
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ABSTRACT EP 817078 A3 

An efficient cache allocation scheme is provided for both uniprocessor 
and multiprocessor computer systems having at least one cache. In one 
embodiment, upon the detection of a cache miss, a determination of 
whether the cache miss is "avoidable" is made. In other words, would the 
present cache miss have occurred if the data had been cached previously 
and if the data had remained in the cache. One example of an avoidable 
cache miss in a multiprocessor system having a distributed memory 
architecture is an excess cache miss. An excess cache miss is either a 
capacity miss or a a conflict miss. A capacity miss is caused by the 
insufficient size of the cache. A eon-flict miss Is -caused 'by insufficient' 
depth in the associativity of the cache. The determination of the excess 
cache miss involves tracking read and write requests for data by the 
various processors and storing some record of the read/write request 
history in a table or linked list. Data is cached only after an avoidable 
cache miss has occured. By caching only at least one avoidable cache miss 
instead of upon every (initial) access, cache space can be allocated in a 
highly efficient manner thereby minimizing the number of data fetches 
caused by cache misses. 
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ABSTRACT EP 817071 A2 

A computer system includes a directory at each node which stores 
coherency information for the coherency units for which that node is the 
home node. In addition, the directory stores a data access state 
corresponding to each coherency unit which indicates the data access 
pattern observed for that coherency unit. The data access state may 
indicate migratory or non-migratory data access patterns. If the 
coherency unit has been observed to have a migratory data access pattern, 
then read/write access rights are granted. Conversely, if the coherency 
unit has been observed to have non-migratory data access patterns, then 
read access rights are granted. The home node further detects the 
migratory and non-migratory data access patterns and selects transitions 
between the migratory and non-migratory data access states independent of 
the cache hierarchies within the nodes which access the affected 
coherency unit. In one embodiment, a pair of counters are employed for 
each coherency unit. One of the counters is incremented when the 
coherency unit is in the migratory data access state and a data migration 
is detected. The other counter is incremented when the coherency unit is 
in the non-migratory data access state and a data migration is detected. 
When one of the counters overflows, the home' node transitions the data 
access state of the coherency unit to the alternate data access state. 
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ABSTRACT EP 817069 Al 

A coherence transformer for allowing a computer node and one or more 
external devices to share memory blocks having local physical addresses 
at a memory module of the computer node. The coherence transformer 



includes logic for ascertaining whether a memory access request from the 
external. device for a memory block should be responded to using a 
snoop-only approach or an Mtag-only approach. The snoop-only approach 
requires a tag in a snoop tag array of the coherence transformer be 
available to track the memory, block fpr an entixQ duration that, the . 
memory block is cached by the external device. The Mtag-only approach 
only temporarily stores the memory block until a global state associated 
with the memory block can be written back into the memory module of the 
computer node. The snoop tag array allows the coherence transformer to 
snoop the bus of the computer node to intervene and respond to memory 
access requests pertaining to a memory block externally cached and 
tracked by the snoop tag array. 
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An apparatus for facilitating the sharing of memory blocks between a 
computer node and an external device irrespective whether the external 
device and the common bus both employ a common protocol and irrespective 
whether the external devic'e and the Common bus bo'th operate at the same 
speed. Each of the memory blocks has a local physical address at a memory 
module of the computer node and an associated Mtag for tracking a state 
associated with that memory block, including a state for indicating 
whether that memory block is exclusive to the computer node, a state for 
indicating whether that memory block is shared by the computer node with 
the external device, and a state for indicating whether that memory block 
is invalid in the computer node. The apparatus includes receiver logic 
configured for coupling with a common bus of the computer node, the 
receiver logic being configured to receive, when coupled to the common 
bus, memory access requests specific to the apparatus on the common bus. 
There is further included a protocol transformer logic coupled to the 
receiver logic for enabling the apparatus, when coupled to the external 
device, to communicate with the external device using a protocol suitable 
for communicating with the external device. 

ABSTRACT WORD COUNT: 206 



LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Examination: OlOIlO Al Date of dispatch of the first examination 

report: 20001123 
980107 Al Published application (Alwith Search Report 

;A2without Search Report) 
030423 Al Transfer of rights to new applicant: Sun 
Microsystems, Inc. (2616592) 4150 Network 
Circle Santa Clara, California 95054 US 
980729 Al Date of filing of request for examination: 
980527 

Change: 980916 Al Designated Contracting States (change) 

LANGUAGE ( Publication , Procedural , Application ) : English; English; English 
FULLTEXT AVAILABILITY: 



Application : 
Assignee : 

Examination : 



Available Text Language Update 
CLAIMS A (English) 9802 
SPEC A (English) 9802 

Total word count - document A 
Total word count - document B 
Total word count ~ documents A + 



B 



Word Count 

3015 

9063 
12078 
0 

12078 



4/5/16 (Item 10 from file: 348) 

DIALOG (R) File 34 8 : EUR0PE7\N PATENTS 

(c) 2004 European Patent Office*. All rt's. rese'rv. - - ■ 

00893681 

Methods and apparatus for a coherence transformer for connecting computer 

system coherence domains 
Verfahren und Vorrichtung fur einen Koharenzumwandler zur Verbindung von 

Rechner sys temkoharen zdomanen 
Precede et dispositif pour transf ormateur de coherence permettant la 

connexion des domaines de coherence de systeme d'ordinateur 
PATENT ASSIGNEE: 

Sun Microsystems, Inc., (2616592), 4150 Network Circle, Santa Clara, 
California 95054, (US), (Proprietor designated states: all) 



(US) 



21 New Fetter Lane, 



(Basic) 



INVENTOR: 

Hagerstein, Erik E, 3451 Cork Oak Way, Palo Alto CA 94043, (US) 
Hill , Mark Donald , 2124 Chamberlain Avenue, Madison WI 53705, 
Wood , David A. , 2115 Bascom Street, Madison WI 53705, (US 
LEGAL REPRESENTATIVE: " 

Turner, James Arthur et al (74631), D. Young & Co. 
London EC4A IDA, (GB) 
PATENT (CC, No, Kind, Date) : EP 817065 Al 980107 

EP 817065 Bl 030917 
APPLICATION (CC, No, Date) : EP 97304519 970625; 
PRIORITY (CC, No, Date) : US 677015 960701 
DESIGNATED STATES: DE; FR; GB; IT; NL; SE 
INTERNATIONAL PATENT CLASS: G06F-012/08 
CITED PATENTS (EP B) : EP 392657 A; EP 801349 A 
CITED REFERENCES (EP B) : 

LENOSKI D ET AL: "THE STANFORD DASH MULTIPROCESSOR' 

no. 3, March 1992, pages 63-79, XP000288291 
O'KRAFKA B W ET AL : "m EMPIRICAL EVALUATION OF TWO MEMORY-EFFICIENT 

DIRECTORY METHODS" PROCEEDINGS OF THE ANNUAL INTERNATIONAL SYMPOSIUM ON 
COMPUTER ARCHITECTURE, SEATTLE, MAY 28 - 31, 1990, no. SYMP . 17, 
INSTITUTE OF ELECTRICAL 7\ND ELECTRONICS ENGINEERS, pages 138-147, 
XP000144792 

LOVETT T ET AL: "STING: A CC-NUMA COMPUTER SYSTEM FOR THE COMMERCIAL 
MARKETPLACE" COMPUTER ARCHITECTURE NEWS, vol. 24, no. 2, May 1996, 
pages 308-317, XP000592 r95 ; * - 



COMPUTER, vol. 25, 



ABSTRACT EP 817065 Al 

An apparatus for facilitating the sharing of memory blocks, which has 
local physical addresses at a computer node, between the computer node 
and an external device. The apparatus includes snooping logic configured 
for coupling with a common bus of the computer node. The snooping logic 
is configured to monitor, when coupled to the common bus, memory access 
requests on the common bus. There is also included a snoop tag array 
coupled to the snooping logic. The snoop tag array includes tags for 
tracking all copies of a first plurality of memory blocks of the memory 
blocks cached by the external device. Further, there is included a 
protocol transformer logic coupled to the snooping logic for enabling the 
apparatus, when coupled to the external device, to communicate with the 
external device using a protocol suitable for communicating with the 
external device. 
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A method in a computer network having a, first plurality of nodes 
coupled to a common network infrastructure and a distributed shared 
memory distributed among the first plurality of nodes for servicing a 
first memory access reques-t by a firs^t node *of the -computer network 
pertaining to a memory block having a home node different from the first 
node in the computer network. The computer network has no natural 
ordering mechanism and natural broadcast for servicing memory access 
requests from the plurality of nodes. The home node has no centralized 
directory for tracking states of the memory block in the plurality of 
nodes. The method includes the step of receiving via the common network 
infrastructure at the home node from the first node the first memory 
access request for the memory block. There is also included the step of 
sending, if the home node does not have a firt valid copy of the memorv 
block, a request from the home node to second plurality of nodes in the 
computer network to request a second node in the computer network to send 
the first valid copy of the memory block to the first node. The second 
plurality of nodes represents the first plurality of nodes excepting the 
first node and the home node. The first valid copy of the memory block 
represents a valid copy that is capable of servicing the first memory 
access request. 
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together with-a cache-cohe.rent protocrol for ..a computer system having -a . 
plurality of sub-systems coupled to each other via a system interconnect. 
In one implementation, each sub-system includes at least one processor, a 
page-oriented COI^ cache and a line-oriented hybrid NUMA/COMA cache. Such 
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able to independently store data in COMA mode or in NUMA mode. When 
caching in COMA mode, a sub-system allocates a page of memory space and 
then stores the data within the allocated page in its COMA cache. 
Depending on the implementation, while caching in COMA mode, the 
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Detailed Description 

Claims 
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English Abstract , ... ... ^ ... . 

A portion of the global memory of a multiprocessing computer system is 
allocated to each node, called local memory space. Data from a remote 
node may be copied to local memory space of a node such that accesses to 
the data may be performed locally rather than globally. The copied data 
is referred to as a shadow page. The global address of the data is 
translated to a local physical address for the node to which the data is 



copied. To reduce the size' of the translation tables ' for " converting 
between global addresses and local physical addresses, the page to which 
shadow copies may be stored and which global addresses may be converted 
to local physical addresses may be restricted. Multiple page of local 
memory space may be allocated to one entry of a local physical address to 
global address (LPA2GA) table. When a page is allocated to store shadow 
pages, an entry in the LPA2GA table associated with that page is marked 
as unavailable. In a similar manner, multiple pages of the global address 
space are mapped to an entry in a global address to local physical 
address (GA2LPA) translation table. To decrease the probability that an 
entry is not available for a page, the GA2LPA table may be implemented as 
a set associative table. To further increase the availability of entries 
in the GA2LPA table, a skewed-associative cache that implements an 
insertion algorithm that realigns the translations in the table to 
maximize the utilization of the available entries is implemented. A 
coherent memory replication (CMR) address space stores shadow pages of 
data from remote nodes and a local address space stores local data. A bit 
within a local physical address identifies whether data is a shadow page, 
which is stored in CMR space, or local data, which is stored in local 
address space. 

French Abstract 

L' invention concerne un systerae informatique multiprocesseur dont une 
partie de la memoire globale est attribuee a chaque noeud, cette partie 
etant denommee espace memoire local. Des donnees provenant d'un noeud a ' 
distance peuvent etre copiees dans 1' espace memoire local d * un noeud de 
sorte que I'acces aux donnees puisse s^effectuer de maniere locale plutot 
que globale. Les donnees copiees sont appelees pages d' ombre. L'adresse 
globale des donnees est traduite dans une adresse physique locale pour le 
noeud dans lequel les donnees sont copiees. Pour reduire la taille des 
tables de traduction destinees a la conversion entre les adresses 
globales et les adresses physiques locales, on peut reduire la page dans 
laquelle des copies d* ombre peuvent etre memorisees et dont les adresses 
globales peuvent etre converties en adresses physiques locales. Plusieurs 
pages d'un espace memoire local peuvent etre attribuees a une entree 
d ' une table de conversion d' adresse physique locale en adresse globale 
(LPA2GA). Lorsqu'on attribue a une table la memorisation de pages 
d' ombre, une entree dans la table LPA2GA associee a cette page est 
indiquee comme non disponible. De la meme maniere, on fait correspondre 
plusieurs pages de 1 ' espace d' adresse globale avec une entree dans une 
table de traduction d ' adresse^ globale en adresse ""physique 'locale 
(GA2LPA) . Pour reduire la probabilite qu ' une entree ne soit pas 
disponible pour une page, on peut installer la table GA2LPA comme une 
table associative determinee. Pour augmenter d'avantage la disponibilite 
d' entrees dans la table GA2LPA, on installe une cache a associati vite 
oblique executant un algorithme d' insertion qui aligne de nouveau les 
traductions dans la table pour maximiser 1 ' utilisation des entrees 
disponibles . Un espace d* adresse de replication de memoire coherente 
(CMR) memorise des pages d' ombre de donnees provenant de noeuds a 
distance et un espace d' adresse locale memorise les donnees locales. Un 
element binaire se trouvant dans une adresse physique locale determine si 
les donnees sont une page d' ombre, memorisee dans un espace CMR, ou des 
donnees locales, memorisees dans un espace d' adresse locale. 
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Access providing shared memory/ foi^corrlputer system, has controller to 
implement one set of rules in /coherent mode of operation and another set 
of rules to provide copy of data in meiqory in read current mode of 
operation 

Patent Assignee: HEW LETT -PACKARD DEV CO LP (HEWP ) 



Inventor: COWAN J P; EBNER S M; JACKSON C H ; SHARMA D D; WICKERAAD J A 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 
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Abstract (Basic) : US 664T469 Bl / 

NOVELTY - The memory has a controller (130) that provides memory 
access to the agents inCboth coherent and read current modes of 
operation. The controller implements a set of rules in the coherent 
mode of operation to insijre that all copies of data stored by the 
agents are coherent with Dhe dalta stored in the memory. Another set of 
rules is implemented to proyidre a copy of the data stored in the memory 
in the read current mode. \J 

DETAILED DESCRIPTION - M\lNDEPENDENT CLAIM is also included for a 
method of providing memory ^cce^s to agents. 

USE - Used for providii/g access to a shared memory in a computer 
system. / \ 

ADVANTAGE - The set at rules irk the read current mode of operation 
copies the data to the a/gent, therebV eliminating the possibility of 
data from becoming staLe and misused Iw another agent. The data 
obtained is limited in/ a boundary that \restricts the read current data 
from causing data cor/uption. The set of^ rules improve the useable 
bandwidth for large payload transfers wiuhin few steps. 

DESCRIPTION OF DnRAWING(S) - The drawiri^ shows a block diagram with 
a shared, memory . / . . . ..V . . - . > . ... 

Cell (105) / \ 

System intercs6nnect (115) \ 

Processor bus (120) \ 

System memojw controller (130) \ 

Main memory/ (135) \^ 
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Cache structure for computer, determines fetch size value using sub-block 
use table which includes entries indicating sub-blocks loaded by miss 
processing circuit 
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Abstract (Basic) : US 6557080 Bl 

NOVELTY - A miss processing circuit responds to request from a 
processor (12) for data of a given sub-block (24) not in a cache (16), 
by loading requested data into several sub-blocks which are not 
requested by the processor as determined by fetch size value (38). A 
sub-block use table (32) to determine fetch size value, has entries 
indicating sub-blocks loaded by the miss processing circuit and which 
are provided with data a'fter loading. 

USE - Cache structure for computer. 

ADVANTAGE - Appropriate fetch block size is determined to satisfy 
the requirement of minimized cache misses and minimized superfluous 
traffic between memory and cache. 

DESCRIPTION OF DRAWING (S) - The figure shows the cache structure 
with a sub-block use table. 

processor (12) 

cache (15) 

sub-block (24) 

sub-block use table (32) 

fetch size value (38) 
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Virtual memory control method in computer system, involves setting 
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Abstract (Basic) : US 20030070058 Al 

NOVELTY - A translation entry mapping indicator (132) is set for 
each entry associated with the given context (134) of memory and a 
validity flag (130) is set for each entry associated with the given 
context. The given context is demapped by changing the mapping 
indicator set for each given context. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
memory management device; and program for controlling virtual memory in 
computer system. 

USE - For controlling physical memory of computer system. 

ADVANTAGE - The time required to demap the given context is reduced 
without degrading the performance of the computer system. Thereby the 



memory management process is performed efficiently. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
memory management unit. 

context mapping indicator (122) 

clean-up indicator (124) 

validity flag (130) 

translation entry mapping indicator (132) 
context (134) 
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Abstract (Basic) : US 6401174 Bl 

NOVELTY - A system interface has error status registers to store 
error information of transactions initiated by processors through a 
local bus. The system interface includes a request agent coupled to the 
error- status registers. ... . . . . 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for error 
information communicating method in multiprocessing system. 

USE - Multiprocessing computer system. 

ADVANTAGE - Supports disclosed error reporting mechanism that 
provides virtualized error information without processor faults or 
traps . 

DESCRIPTION OF DRAWING (S) - The figure shows the flowchart for 
error reporting in multiprocessor computer system, 
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of SCSI device to network management system 
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Abstract (Basic) : WO 200246866 A2 

NOVELTY - A request to identify a remote small computer system 
interface (SCSI) device is received at a switch element. An address 
resolution protocol (ARP) entry for the SCSI device is returned based 
on the request. A SCSI read request is transmitted from the switch 
element to the SCSI device and data about existence of SCSI device is 
notified to network management system. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for the 
following: 

(1) Method of enabling the NMS to receive SNMP traps in response t 
generation of SCSI exceptions by the remote SCSI device; and 

(2) IP data network., . , ... 
USE - For identifying remote SCSI device IP data network (claimed) 

ADVANTAGE - Enables automatic identification of SCSI devices over 
the IP network and mapping of SNMP requests to SCSI. Provides WAN 
mediation caching on local devices. 

DESCRIPTION OF DRAWING (S) - The figure shows the architecture of 
the switch system. 
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Abstract (Basic) : US 20020038398 Al 

NOVELTY - A request for locked transaction is detected through a 
32-bit bus (116). The status of a register is changed to quiesce the 
computer system, when a shared resource required for transaction is 
obtained. The locked transaction is sent through a 64-bit bus (115) fo 
execution. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
locked transaction permitting apparatus. 

USE ~ For permitting locked transaction within computer system. 

ADVANTAGE - The controllers function effectively to permit bus 
locking in a system having a bus that does not use traditional bus 
locking . 

DESCRIPTION OF DRAWING (S) - The figure shows the computer system. 
64"bit bus (115) 
32-bit bus (116) 
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Abstract (Basic) : US 20020019921 Al 

NOVELTY - An address circuit converts input address to primary and 
secondary look-up addresses which correspond to primary and secondary 
entries. A data corresponding to tlie input address/ is'stored in the 
primary entry if primary entry is available, otherwise data is stored 
in the secondary entry. If both primary and secondary entries are not 
available, an alternate address is determined. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(a) Data storage and retrieval method; 

(b) Method of increasing utilization of look-up table 
USE - For storing data corresponding to input address in 

multiprocessor computer systems. 

ADVANTAGE - The access time of the look-up table is minimized and 
utilization of the table is maximized by realigning data stored in the 
table if an entry for new data is not available. 

DESCRIPTION OF DRAWING (S) - The figure shows a block diagram of the 
multiprocessor computer system. 
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Abstract (Basic) : US 20020004886 Al 

NOVELTY - A system interface (24) coupled between a global bus and 
local bus, receives global transaction including address from the 
remote node (12). The address is used for selecting particular entry of 
memory management unit (76) . The selected entry has a field including a 
specific value for controlling the operation of local bus, in response 
to global transaction. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
method of operating a multiprocessing computer system. 

USE - Multiprocessing computer system e.g. symmetric 
multiprocessing computer* system. ■ •'^ ■ • • . . . , 

ADVANTAGE - Enables a node to control the resources used by remote 
cluster nodes and simple cluster communication protocols are 
implemented at the global level. Access restrictions are specified 
flexibly without the requirement of large memory capacity, 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
symmetric multiprocessing node depicted multiprocessor computer system. 

Remote node ( 12 ) 
System interface (24) 
Memory management unit (76) 
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Abstract (Basic) : US 6308246 Bl 

NOVELTY - A look-up address circuit receives input address and 
converts into primary or secondary look-up addresses relative to 
primary or secondary entry. Look-up table stores a datum in primary 
entry if primary entry is available, else stores in secondary entry and 
moves a specified datum from primary entry to alternate entry if both 
the entries are not available for the datum, to store the datum in 
primary entry. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
data storing and retrieving method. 

USE - For multiprocessor computer system. 

ADVANTAGE - Since the method provides an entry to store new datum 
by moving the datum in primary entries to alternative entry, table 
utilization is increased, thereby approaches utilization of fully 
associative table. 

DESCRIPTION OF DRAWING (S) - The. figure, shows .the block diagram .of 
symmetric multiprocessing node. 
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Abstract (Basic) : JP 2000322317 A 

NOVELTY - The address error which occurs in local channel (130) of 
computer system (100) is detected. The corresponding coherency 
condition is read from one or more address line specified from cache 
memor-y (120) and recover.y routine i^ called, for rectifying the address 
error. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
address error recovery system. 

USE - For recovering address error in computers and processors. 

ADVANTAGE - The cache coherency information is utilized, hence 
address error is easily detected. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
multiprocessor computing system. 

Computer system (100) 

Cache memory (12 0) 



Local channel (130) 
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Backup memory storage apparatus for digital data processing system has 
memory to store audit trail entries based on indication of controller 
which comprises request of audit trail entries 

Patent Assignee: UNISYS CORP (BURS ) 

Inventor: COOPER T P; HILL M J ; KONRAD D R; NOWATZKI T L 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 
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US 6079000 A 20000620 US 971136 A 19971230 200057 B 

Priority Applications (No Type Date) : US 971136 A 19971230 
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US 6079000 A 27 G06F-012/00 

Abstract (Basic) : US 6079000 A 

NOVELTY - A controller is coupled to three memory units, A portion 
of audit trail entries from third memory is stored in first memory 
until an indication is received from the controller, where the 
indication comprises synchronous audit data request of audit trail 
entries having a commit in progress status. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
backup memory storage method. 

USE - For digital data processing system. 

ADVANTAGE - Since the memory stores audit trail entries based on 
indication of controller which comprises request of audit trail entries 
having commit in progress status, the transfer efficiency of audit 
trail entries is improved. 

DESCRIPTION OF DRAWING (S) - The figure shows the flow chart 
explaining the backup memory storage method, 
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Abstract (Basic) : US 6018746 A 

NOVELTY - The memory architecture comprises several storage modules 
for holding transaction recovery information and several memory 
structures corresponding to a distinct transaction to isolate 
accessibility of the recovery information. Each memory structure 
comprises a control information field and several information address 
fields. The memory structures are non-volatile RAM and are arranged in 
a linked list. Also included is an INDEPENDENT CLAIM for a transaction 
processing system. 

USE - For use in a multi-processing environment such as airline or 
banking systems. 

ADVANTAGE - Prevents loss of transaction information. Provides a 
centralized, commonly accessible system for effecting recovery actions 
in a data processing system. 

DESCRIPTION OF DRAWING (S) - The figure is a block diagram of system 
providing task management. 
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Abstract (Basic) : WO 9912103 A2 

NOVELTY - The multiprocessor system has symmetrical multiprocessor 
nodes connected with other nodes to form clusters. The nodes have 
interfaces (24) with input (84) and output (86) request queues. Tables 
(80,82) handle translations between global and local addresses. Memory 
areas can be set up as shadow pages. The global to local address 
translation table is set associative. An insertion algorithm is used in 
this skewed associative cache to realign translations for maximum 
utilization . 

USE - Global address translation in cluster systems 

ADVANTAGE - By using set associative and an insertion algorithm the 
access time and memory needed are optimized. 

DESCRIPTION OF DRAWING (S) - Cluster node interface 
System interface (24) 
Network I/O queues (84,86) 

Global/Local address translation tables (80,82) 

Queues to symmetrical processor' node (92, 9^) 

Access control agents (100,504) 

Cluster memory management (504) 
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Abstract (Basic) : WO 9912102 Al 

NOVELTY - The multiprocessor system has symmetrical multiprocessor 
nodes connected with other nodes to form clusters. The nodes have 
interfaces (24) with input (84) and output (86) request queues. These 
are handled by cluster agents (100^-502) that reference a cluster memor;^ 
management unit (504) to determine valid accesses. The memory 
management unit includes values that dictate what type of operations 
are permitted on the local node by a remote node. Error status is also 
monit'ored. " - . ^ , . 

USE - Access control for clustered processor nodes 

ADVANTAGE - Provides communication protocols for interconnecting 
clusters under user and kernel level control, 

DESCRIPTION OF DRAWING (S) - Cluster node interface 

System interface (24) 

Network I/O queues (84,86) 

Queues to symmetrical processor node (92,96) 

Access control agents (100,504) 

Cluster memory management (504) 
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Abstract (Basic) : EP 818732 A 

The method involves receiving via a common network 
infrastructure at a home node from the first node a first memory 
access request for a memory block (708). If directory states 
representing states of copies of the memory block on the first 
number of nodes are cached in a directory cache (702) entry of 
the partial directory cache, the first memory (704) access 
request is serviced using a directory protocol. This is 
performed by consulting the directory cache entry to determine 
which node in the computer network currently possesses a first 
valid copy of the memory block. 

The latter represents a valid copy of the memory block that is 
capable of servicing the first memory access request. If the 
directory states related to the memory block are not cached in 
the partial directory cache, the first memory access request is 
serviced using a directory-less protocol. 

ADVANTAGE - Permits directory entries corresponding to memory 
b^locks of network di.s trJLbuted shared memory to ,be accessed in . 
servicing memory access requests. 
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Multiprocessor computer system with extended symmetrical architecture - 
has repeater that generates incoming control signal for controlling when 
first processor element receives transactions from first incoming queue 
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Abstract (Basic) : EP 817095 A 

The system includes a repeater 34a ) that receives , incoming and 
transmits outgoing transactions. A first bus (34) is coupled to the 
repeater by the bus. The first bus includes a first incoming queue and 
a first processor element. The latter receives the incoming 
transactions from the repeater and the first processor element receives 
the outgoing transaction from the first incoming queue. The repeater 
generates an incoming control signal for controlling when the first 
processor element receives transactions from the first incoming queue. 

The first processor element receives each of the outgoing 
transactions from the first incoming queue at approximately the same 
time as each of the outgoing transactions are received by other devices 
in the multiprocessor computer system. 

USE - As architectural connection within multiprocessor computer 
system. 

ADVANTAGE ~ Maintains memory coherency between each node. Allows 
maximum bus bandwidth to be used 
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Node in multiprocessor computer system - has first repeater that is 
coupled to top level interface by upper level bus, and including incoming 
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Abstract '(Basic) : EP 817092 A 

The node includes a top level interface that receives incoming 
transactions and transmits outgoing transactions. The latter originate 
in the node and the incoming transaction do not originate in the node. 
An upper level bus (22) is provided and a first repeater (34) .The 
latter is coupled to the top level interface by the upper level bus. 
The first repeater includes an incoming queue and a bypass path. The 
first repeater receives the incoming transactions from the top level 
interface and transmits the incoming transactions via the bypass path 
to a lower level bus (32) . 

The first repeater receives the outgoing transaction via the 
incoming queue to a lower level bus . The top level interface receives 
the incoming transactions on an unidirectional point-to-point link with 
another node. The top level interface transmits the outgoing 
transactions on an unidirectional point-to-point link with the another 
node . 

USE - As symmetrical multiprocessor architecture. 

ADVANTAGE - Maintains memory coherency between each node without 
using coherency state tags 
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Abstract (Basic) : EP 817080 A 

The method involves incrementally storing cache-line-sized 
components of a page-sized block of data in at least two of the cache 
lines of the lower level cache. A trigger for reviewing cache space 
allocation is then detected. At least two cache lines are identified 
from among the number of cache lines of the lower level cache as 
storing the components o'f the data."' If th^' numJ5er c3f thd identified at 
least two cache lines exceeds a threshold, then allocating one of the 
number of pages of the higher level cache. 

The components are stored in at least two corresponding cache lines 
of the allocated page of the higher level cache. 

ADVANTAGE - Provides efficient mechanism to select data structures 
for caching which optimises allocation of higher level cache memory 
space in multi-level computer system to maximise usage of cache and 
minimise overall assess time to data. 
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Abstract (Basic) : EP 817079 A 

The method involves determining that a dirty copy of the data of a 
lower-level " cache needs "to 'be replaced by "writing Back'tihe dirty copy 
from the lower-level cache to the home location, by updating the stale 
copy of data in the home location. The stale copy of the data in the 
higher-level cache is then updated or invalidated, thus ensuring that 
any copy of the data remaining in the upper-level cache is consistent 
with the updated copy of data in the home location. 

The method further entails requesting an exclusive copy of the data 
from the home location. The dirty copy is written back from the 
lower-level cache to the home location, by updating the stale copy of 
data in the home location. 

USE - In computer system memories. 

ADVANTAGE - Provides flexible scheme for designating memory write 
back protocols for multiple level of memories within computer system 
for data coherency. 
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Abstract (Basic) : EP 817078 A 

The method involves searching for the data in a cache, then 
detecting a cache miss when the data cannot be found in the cache. If 
the cache miss is detected, then it requires fetching the data from a 
main memory. If the cache miss is determined to be the avoidable cache 
miss, then caching the data in the cache. The avoidable cache miss is 



an excess cache miss or a capacity miss of the data or a conflict miss 
of the data. 

The method further entails writing back any older data displaced 
from the cache to the main memory as a result of caching the data, 
while a count of the avoidable cache miss{es) of the data and a count 
of cache hit(s) of the older data are maintained. The data is cached 
and the older data is wratt^n back .only . if. the .count of .the avoidable 
cache miss(es) exceeds the count of the cache hit(s). 

ADVANTAGE - Minimises data fetches caused by cache misses since 
likelihood of data being accessed again increases dramatically if it 
has been accessed at least twice. 
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Migratory data access pattern detection and handling system for 
multiprocessor computer system - has multiple symmetric multiprocessor 
nodes interconnected by point to point network including cache , SMP bus , 
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Abstract (Basic) : EP 817071 A 

The system includes a directory at each node which stores coherenc; 
information for the coherency units for which that node is the home 
node. In addition, the directory stores a data access state 
corresponding to each coherency unit which indicates the data access 
pattern observed for that coherency unit. 

The data access state may indicate migratory or non migratory data 
access patterns. If the coherency unit has been observed to have a 
migratory data access pattern, read or write access rights are granted 
Conversely, if the coherency unit has been observed to have non 
migratory data access paJ:tej:ns, then read.acceas .rights .are gran-ted. 



I 



ADVANTAGE - increases performance by more efficient handling of 
migratory data access patterns while still handling the non migratory 
data access patterns efficiently. 
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Abstract (Basic) : EP 817069 A 

The method involves using a coherence transformer to allow a 
computer node of one or more external devices to share memory blocks 
having local physical addresses at a memory module of the computer 
node. The coherence transformer includes logic for ascertaining whether 
a memory access request from the external device for a memory block 
should be responded to using a snoop only approach or a memory status 
tag only approach. 

The snoop only approach requires a tag in a snoop tag array of the 
coherence transformer to be available to track the memory block for an 
entire duration that the memory block is cached by the external device. 
The memory status tag approach only temporarily stores the memory block 
until a global state associated with the memory block can be written 
back into the memory module of the computer node. 

ADVANTAGE - Permits memory blocks having local physical address in 
particular computer node to be shared, in an efficient and error free 
manner, among interconnected entities such as internal processing nodes 
and external devices. . . ... ^ . . . . . . . . 
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Memory block sharing method between computer node and external device 
by using memory state tag indicating in which of computer node and 
external mode data is valid also coherence transformer for protocol 
switching 
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Abstract (Basic) : EP 817068 A 

The method involves enabling sharing of memory blocks between a 
computer node and an external device irrespective of whether the 
external device and the common bus both employ a common protocol or 
both operate at the same speed. The method also involves employing 
apparatus in which each of the memory blocks has a local physical 
addre^ss at a memory module pf the computer node and an ^ associated 
memory state tag. 

The state tag has states for indicating whether that memory block 
is exclusive to the computer node, is shared by the computer node with 
the external device or is invalid in the computer node. The apparatus 
includes receiver logic configured for coupling with a common bus of 
the computer node and to receive memory access requests specific to the 
apparatus on the common bus . 

ADVANTAGE - Facilitates sharing irrespective of whether external 
device and common bus both employ common protocol and whether they 
operate at same speed. 
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Abstract (Basic) : EP 817065 A 

The method involves coupling to a common bus to obtain a first copy 
of a first memory block having a local physical address in the memory 
on behalf of an external device. A transformer receives the first 
memory access request for the first memory block from the external 
device^ acquires the first copy of the first memory block from the 
common bus and uses a snoop tag array to track the state (exclusive, 
shared, invalid) of the first copy at the external device, the first 
copy is then sent to the external device from the transformer. 

ADVANTAGE - Enables an external device to share memory blocks 
having local physical addresses in a memory module at the computer 
node,' irrespective of wh'ether the e^xternal devi-ce and common bus- both 
employ a common protocol or whether they operate at the same speed. 
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The method involves using a computer network having multiple nodes 
connected to a common network infrastructure, and a shared memory 
distributed among the nodes, with no natural ordering mechanism and 
natural broadcast for servicing memory access requests from the nodes. 
The first node of the network is enabled to access a copy of a memory 
block having a home node which differs from the first, the home node 
having no centralised directory for storing memory block states in the 
nodes. The memory access request is received at the home node from the 
first node via the common network infrastructure, and the memory block 
status is marked as pending, rendering the home node incapable of 
servicing other memory access requests. If the home node has no valid 
copy of the memory block the home node requests the second set of nodes 
to send a second valid copy of the memory block to the first node. When 
this access request fulfilment acknowledgment is received the memory 
block status is marked as non-pend ing, allowing other memory access 
requests to be serviced. 

ADVANTAGE - facilitates efficient communication in a computer 
network using distributed shared memories. 
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Abstract (Basic): EP 817040 A 

The system includes a dynamic lock structure having a number of 
dynamic lock structure elements^ the number of which is fewer in number 
than a number of the first number of stored data objects. The mapping 
function renders it likely that only one stored data object of the 
second number of stored data objects that maps into the first dynamic 
lock structure member is accessed at any given point in time by a 
thread of the multiple threads. 

A storage facility is associated with the first dynamic lock 
structure for storing identities of a third number of stored data 
objects. The latter represents a sub-set of the second number of stored 
data objects that are currently being accessed. 

USE - For sharing stored data objects in computer system. 

ADV7\NTAGE - Avoids access conflicts that may arise when multiple 
threads attempt to access same stored data object. 
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Abstract (Basic) : EP 780770 A 

The method involves storing data in a computer . system with several 
subsystems coupled to each other by a system interconnect. Each 
subsystem includes a processor, a hybrid non uniform memory • 
architecture (NUMA COMA) cache, a COMA cache and a directory. 

Data associated with a data line is stored in the hybrid NUMA COMA 
cache of one of the subsystems. It is determined whether the data 
should also be stored in the COMA cache of the one subsystem in a COMA 
mode 

USE/ADV^ANTAGE - Relates to hybrid caching architectures and 
protocols for multi-processor computer systems. Optimises COMA-only and 
NUMA-only caches architectures. 
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Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

Abstract: This paper describes a kernel interface that provides an 
untrusted user-level process (an executive) with protected access to 

memory management functions, including the ability to create, 

manipulate, and execute within subservient contexts (address spaces). Page 
motion callbacks not only give the executive limited control over physical 

memory • management , but also shift certain responsibilities out of the 
kernel, greatly reducing kernel state and complexity. The executive 
interface was motivated by the -requirements of the Wisconsin Wind Tunnel 
(WWT) , a system for evaluating cache-coherent shared-memory parallel 
architectures. WWT uses the executive interface to implement a fine-grain 
user-level extension of Li's shared virtual memory on a Thinking Machines 
CM-5, a message-passing multicomputer. However, the interface is 
sufficiently general that an executive could act as a multiprogrammed 
operating system, exporting an alternative interface to the threads running 
in its subservient contexts. The executive interface is currently 
implemented as an extension to CMOST, the standard operating system for the 
CM-5. (16 Refs) 
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. . .Abstract: describes a kernel interface that provides an untrusted 
user-level process (an executive) with protected access to memory 

management functions, including the ability to create, manipulate, and 
execute within subservient contexts (address spaces) . Page motion callbacks 
not only give the executive limited control over physical memory 
management , but also shift certain responsibilities out of the kernel, 
greatly reducing kernel state and complexity. The executive... 
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Abstract: Asynchronous cell-based transmission is the preferred 
transmission mode for emerging high-speed network standards such as the 
IEEE 802.6 metropolitan-area-network standard and the CCITT broadband 
integrated services digital network. These networks are envisaged to 
operate at bit rates in excess of 100 Mbit/s. The high bit rate and the 
cell-based mode of transmission pose challenging requirements on 
memory-buffer management and the reassembly of packets from constituent 
cells. The paper describes hardware architecture and memory - management 
techniques developed to achieve the required packet-reassembly functions 
and buffer- memory management for a node operating in a high speed 

asynchronous-transfer-mode-based network. The paper also discusses a number 
of major generic issues addressed during the development. (15 Refs) 
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...Abstract: memory-buffer management and the reassembly of packets from 
constituent cells. The paper describes hardware architecture and memory - 
management techniques developed to achieve the required packet-reassembly 
functions and buffer- memory management for a node operating in a high 
speed asynchronous-transfer-mode-based network. The paper also discusses... 
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Abstract: For pt . I see ibid., vol.24, no. 6, p. 1688-98 (1989). The authors 
describe a memory management- unit and a - cache- controller (MMU/CC) for a- 
40-70-MIPS multiprocessor workstation. The MMU/CC implements a novel 
memory management scheme, in-cache address translation, which does not 
require a translation lookaside buffer. It also implements a snooping bus 
protocol to maintain data consistency across all caches in the system. The 
chip is implemented in a 1 . 6- mu m double-layer-metal CMOS technology and 



is being used in a multiprocessor works tation "( SPUR) successfully executing 
a UNIX-like network-based operating system called Sprite as well as many 
applications, including LISP programs. (18 Refs) 
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Abstract: For pt.I see ibid., vol.24, no. 6, p. 1688-98 (1989). The authors 
describe a memory management unit and a cache controller (MMU/CC) for a 
40-70-MIPS multiprocessor workstation. The MMU/CC implements a novel 
memory management scheme, in-cache address translation, which does not 
require a translation lookaside buffer. It also implements a... 
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Abstract: The SPUR ('Symbolic Processing Using RISCs') Project is a 
research effort aimed at applying reduced instruction set computer concepts 
to the support of LISP programming environments. It extends previous 
Berkeley RISC efforts by exploring virtual memory, multiple processors, and 
co-processor support. The authors summarize the algorithms and 
implementation challenges of the cache controller chip. They briefly 
outline their protocol for cache coherency and mechanism for virtual 
memory management . They describe the on-chip performance monitoring 

hardware and some of the implementation challenges and solutions. Finally, 
the chip status and statistics are given. (4 Refs) 

Subfile: B C 

Descriptors: buffer storage; microprocessor chips; reduced instruction 
set computing; symbol manipulation 

Identifiers: SPUR cache controller chip; Symbolic Processing; RISCs; LISP 
programming environments; protocol; cache coherency; virtual memory 
management 

Class Codes; B1265F (Microprocessors and microcomputers); B1265D (Memory 
circuits); C5130 (Microprocessor chips); C5220 (Computer architecture); 



C5320G (Semiconductor storage) 



Author(s): Wood, D.A. ; Eggers, S.J.; Gibson, G.A. ; Deog-Kyoon Jeong; 
Katz, R.H.; Patterson, D.A. 

...Abstract: of the cache controller chip. They briefly outline their 
protocol for cache coherency and mechanism for virtual memory management 
. They describe the on-chip performance monitoring hardware and some of the 
implementation challenges and solutions-. Finally.-. ■ ' 
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Abstract: Asynchronous cel'l-based transmission is- the preferred 
transmission mode for emerging high-speed network standards such as the 
IEEE 802.6 metropolitan area network standard and the CCITT broadband 
integrated services digital network. These networks are envisaged to 
operate at bit rates in excess of 100 Mbit/s. The high bit rate and the 
cell-based mode of transmission pose challenging requirements on 
memory-buffer management and the reassembly of packets from constituent 
cells. The paper describes hardware architecture and memory ~ management 
techniques developed to achieve the required packet-reassembly functions 
and buffer- memory management for a node operating in a high speed 
asynchronous-transfer-mode-based network. The paper also discusses a number 
of major generic issues addressed during the development, (Author abstract) 
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,.. Abstract: memory-buffer management and the reassembly of packets from 
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7\bstract: The authors describe a itiemory management unit and a cache 
controller (MMU/CC) for a 40-70-MIPS multiprocessor workstation. The MMU/CC 
implements a novel memory management scheme, in- cache address 
translation, which does not require a translation lookaside buffer. It also 
implements a snooping bus protocol to maintain data consistency across all 
caches in the system. The chip is implemented in a 1 . 6- mu m 
double-layer-metal CMOS technology and is being used in a multiprocessor 
workstation (SPUR) successfully executing a UNIX-like network-based 
operating system called Sprite as well as many applications, including LISP 
programs. 18 Refs. 
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Abstract: Distributions of segme\t sizes measured under routine operating 
conditions on a computet system whifch utilizes variable sized segments (the 
Burroughs B5500) are discussed. The Vost striking feature of the 
measurements is the la/rge number of sVall segments- about 60% of the 
segments in use conta/n less than 40 w^rds . Although the results are 
certainly not installation independent ,Nand although they are particularly 
influenced by features of the B5500 ALGOL syst.em,.. they .should be relevant 



to the design of new computer systems, especially with respect to the 
organization of paging schetQes. 
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Abstract: Hardware trends have produced an increasing disparity between 

processor speeds and memory access times. While a variety of techniques 
for tolerating or reducing memory latency have been proposed, these are 
rarely successful for pointer-manipulating programs. 

This paper explores a complementary approach that attacks the 
source (poor reference locality) of the problem rather than its 
manifestation (memory latency) . It demonstrates that careful data 
organization and layout provides an essential mechanism to improve the 
cache locality of pointer-manipulating programs and consequently, their 
performance. It explores two placement techniques-clustering and 
coloring-that improve cache performance by increasing a pointer 
structure's spatial and temporal locality, and by reducing 
cache-conflicts . 

To reduce the cost of applying these techniques, this paper 
discusses two strategies-cache-conscious reorganization and 
cache-conscious allocation-and describes two semi-automatic 
tools'-ccmorph and ccmallbc-that use these "s tr'at:egies to 'produce 
cache-conscious pointer structure layouts, ccmorph is a transparent 
tree reorganizer that utilizes topology information to cluster and 
color the structure, ccmalloc is a cache-conscious heap allocator that 
attempts to co-locate contemporaneously accessed data elements in the 
same physical cache block. Our evaluations, with microbenchmarks , 
several small benchmarks, and a couple of large real-world 
applications, demonstrate that the cache-conscious structure layouts 
produced by, ccmorph and ccmalloc offer large performance benefits-in 
most cases, significantly outperforming state-of-the-art prefetching. 
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Abstract: Asynchronous cell-based transmission is the preferred 

transmission mode for emerging high-speed network standards such as the 
IEEE 802.6 metropolitan area network standard and the CCITT broadband 
integrated services digital network. These networks are envisaged to 
operate at bit rates in excess of 100 Mbit/s. The high bit rate and the 
cell-based mode of transmission pose challenging requirements on 
memory-buffer management and the reassembly of packets from constituent 
cells. The paper describes hardware architecture and memoxY ~ 
management techniques developed to achieve the required 

packet-reassembly functions and buffer- memory management for a node 
operating in a high speed asynchronous-transfer-mode-based network. The 
paper also discusses a number of major generic issues addressed during 
the development. 
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Document type: journal article Language: English 

Record type: Abstract 

ISSN: 0163-5964 

ABSTRACT:- ■ . - . , .... 

This paper investigates hardware support for fine-grain distributed shared 
memory (DSM) in networks of workstations. To reduce design time and 
implementation cost relative to dedicated DSM systems, the authors decouple 
the functional hardware components of DSM support, allowing greater used of 
off-the-shelf devices. They present two decoupled systems, Typhoon-0 and 
Typhoon-1. Typhoon-0 uses an off-the-shelf protocal processor and network 
interface; a custom access control device is the only DSM-specific 
hardware. To demonstrate the feasibility and simplicity of the access 
control device, they designed and built an FPGA-based version in under one 
year. Typhoon-1 also uses on off-the-shelf protocol processor, but 
integrates the network interface and access control devices for higher 
performance. They compare the two decoupled systems with two integrated 
systems via simulation. For six benchmarks on 32 nodes, Typhoon-0 ranges 
from 30 % to 309 % slower than the best integrated system, while Typhoon-1 
ranges from 13 % to 132 % slower. Four of the six benchmarks achieve 
speedups of 12 to 18 on Typhoon-0 and 15 to 26 on Typhoon-1, compared with 
19 to 35 on the best integrated system. Two benchmarks are hampered by high 
communication overheads, but selectively replacing shared-memory operations 
with message passing provides speedups of at least 16 on both decoupled 
systems. These speedups indicate that decoupled designs can -potentially 
provide a cost-effective alternative to complex high-end DSM systems. 
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ABSTRACT: 

Historically, processor accesses to memory-mapped device registers have 
been marked uncachable to insure their visibility to the device. The 
ubiquity of snooping cache coherence, however, makes it possible for 
processors and devices to interact with cachable, coherent memory 
operations. Using coherence can improve performance by facilitating burst 
transfers of whole cache blocks and reducing control overheads (e.g., for 
polling) . This paper begins on exploration of network interfaces (NIs) that 
use coherence - coherent network interfaces (CNIs) - to improve 
communication performance. The authors restrict this study to NI/CNIs that 
reside on coherent memory or I/O buses, to Nl/CNIs that are much simpler 
than processors, and to the performance of fine-grain messaging from user 
process to user process. Their first contribution is to develop and 
optimize two mechanisms that CNIs use to communicate with processors, A 
cachable device register - derived from cachable control registers - is a 
coherent, cachable block of memory used to transfer status, control, or 
data between a device and a processor. Cachable queues generalize cachable 
device registers from one cachable, coherent memory block to a contiguous 
region of cachable, coherent blocks managed as a circular queue. Their 
second contribution is a taxonomy and comparison of four CNIs with a more 
conventional NI . Microbenchmark results show that CNIs can improve the 
round-trip latency and achievable bandwidth of a small 64-byte message by 
37 % and 125 % respectively on the memory bus and 74 % and 123 % 
respectively on a coherent I/O bus. Experiments with five macrobenchmarks 
show that CNIs can improve the performance by 17-53 % on the memory bus and 
30-88 % on the I/O bus. 
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ABSTRACT: 

Massively parallel processors have begun using commodity operating systems 
that support demand-paged virtual memory. To evaluate the utility of 
virtual memory, the authors measured the behavior of seven shared-memory 
parallel application programs on a simulated distributed shared-memory 
machine. The results (i) confirm the importance of gang CPU scheduling, 
(ii) show that a page- faulting processor should spin rather than invoke a 
parallel context switch, (iii) show that the parallel programs frequently 
touch most of their data, and (iv) indicate that memory, not just CPUs, 
must be 'gang scheduled'. Overall, the experiments demonstrate that demand 
paging has limited value on concurrent parallel machines because of the 
applications^ synchronization ahd memory reference "patferns 'and the 
machines' high page-fault and parallel-context-switch overhead. 
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ABSTRACT: / \ 

Modern distributed memory parallel computers, provide hardware support for 
the efficient and reliable delivery of interprocessor messages. This 
facility needs to be accessed by lightweight protocols that do not waste 
the performance of the underlying hardware; thA heavyweight layering 
techniques traditionally used in distributed sysNiems are wholly 
inappropriate. A low-leyJ^el communication interface is therefore presented 
which exploits modern architectures effectively, while maintaining a good 
match to existing parallel programming environments. The interface defines 
mechanisms to access an asynchronous ' reliable "packet delivery service. It 
permits messaging protocols to be efficiently synthesized by considering 
the activity at their end-points alone. This arrangement effectively 
decouples the implementation of protocols from low-level architectural 
features, and hence aids the portability of parallel programming 
environments. Furthermore, the interface allows the communication network 
to be shared by multiple programming paradigms, giving additional 
flexibility over existing systems. 
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ABSTRACT: 

This paper discusses implementations of fine-grain memory access control, 
which selectively restricts reads and writes to cache-block-sized memory 
regions. Fine-grain access control forms the basis of efficient 
cache-coherent shared memory. This paper focuses on low-cost 
implementations that require little or no additional hardware. These 
techniques permit efficient implementation of shared memory on a wide range 
of parallel systems, thereby providing shared-memory codes with a 
portability previously limited to message passing. This paper categorizes 
technique's based on where access control is * en^f orced and where access 
conflicts are handled. The authors incorporated three techniques that 
require no additional hardware into Blizzard, a system that supports 
distributed shared memory on the CM-5. The first adds a software lookup 
before each shared-memory reference by modifying the program's executable. 
The second uses the memory's error correcting code (ECC) as cache-block 
valid bits. The third is a hybrid. The software technique ranged from 
slightly faster to two times slower than the ECC approach. Blizzard's 
performance is roughly comparable to a hardware shared-memory machine. 
These results argue that clusters of workstations or personal computers 
with networks comparable to the CM-5's will able to support the same 
shared-memory interfaces as supercomputers. 
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ABSTRACT: 

The authors present a data-race- free-1 , shared-memory model that unifies 
four earlier models: weak ordering, release consistency (with sequentially 
consistent special operations), the VAX memory model, and data-race- free-0 . 
Data-rac&~f ree-1 unifies the* models of .jweak. ordering,- release consistency, 
the VAX, and data- race- free-0 by formalizing the intuition that if programs 
synchronize explicitly and correctly, then sequential consistency can be 
guaranteed with high performance in a manner that retains the advantages of 
each of the four models. Data-race-f ree-1 expresses the programmer's 
interface more explicitly and formally than weak ordering and the Vi\X, and 
allows an implementation not allowed by weak ordering, release consistency, 
or data-race-f ree-0 . The implementation proposal for data-race-f ree- 1 
differs from earlier implementations by permitting the execution of all 
synchronization operations of a processor even while previous data 
operations of the processor are in progress. To ensure sequential 
consistency, two sychronizing processors exchange information to delay 
later operations of the second processor that conflict with an incomplete 
data operation of the first processor. 
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ABSTRACT: 

Asynchronous cell-based transmission is the preferred transmission mode for 
emerging high-speed network standards such as the IEEE 802.5 
metropolitan-area-network standard and the CCITT broadband integrated 
services digital network. These networks are envisaged to operate at bit 
rates in excess of 100 Mbit/s, The high bit rate and the cell-based mode of 
transmission pose challenging requirements on memory-buffer management and 
the reassembly of packets from constituent cells. The paper describes 
hardware architecture and memory - management techniques developed to 
achieve the required packet-reassembly functions and buffer- memory 
management for a node operating in a high speed 



^synchronous-transf er-mode-based network. The paper also discusses a number 
of major generic issues addressed during the development, 
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ABSTRACT: 

. . .memory-buffer management and the reassembly of packets from constituent 
cells. The paper describes hardware architecture and memory - management 
techniques developed to achieve the required packet-reassembly functions 
and buffer- memory management for a node operating in a high speed 
asynchronous-transfer-mode-based network. The paper also discusses... 
...IDENTIFIERS: RECEIVER; CELL BASED TRANSMISSION; HIGH SPEED NETWORK 
STANDARDS; CCITT BROADBAND INTEGRATED SERVICES DIGITAL NETWORK; MEMORY 
BUFFER MANAGEMENT; MEMORY MANAGEMENT TECHNIQUES; PACKET REASSEMBLY 
FUNCTIONS; MAN. . . 



4/5,K/18 (Item 8 from file: 95) 

DIALOG (R) File 95 : TEME-Technology & Management 
(c) 2004 FIZ TECHNIK. All rts . reserv. 

00615505 E92103213025 

Sorting, measures of disorder, and worst-case performance 

(Sortierung, Messung der Unordhung urid'Xeistung unfef shlechtester 

Bedingung) 

Es tivill-Castro, V; Wood, D 

York Univ., CDN; Univ. of Waterloo, CDN 

New Results and New Trends in Computer Science, Graz, A, June 20-21, 1991 
1991 

Document type: Conference paper Language: English 

Record type: Abstract 

ISBN: 3-540-54869-6; 0-387-54 869-6 

ABSTRACT: 

We design main memory algorithms that sort in worst-case time such that the 
time varies smoothly from linear to optimal time for files that vary from 
being nearly sorted and have little disorder to being very unsorted. They 
are adaptive sorting algorithms. To this end, we: A) introduce three basic 
ways of measuring nearly sortedness or disorder; B) give a sorting 
algorithm. Strip Sort, that is worst-case-optimally adaptive to one of 
them; C) give an axiomatic definition of measures of disorder; D) and 
provide an infinitude of sorting algorithms that can be proved to be 
adaptive, with respect to many different measures, under some reasonable 
assumptidns , 
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ABSTRACT: As VLSI technology improvements continue to widen the gap 
between processor and main memory cycle times, cache performance becomes 
increasingly important to overall system performance. Cache memories help 
alleviate the cycle-time disparity, although only for programs that exhibit 
sufficient spatial and temporal locality. Programs with unruly access 
patterns -consume a lot of time -trans fer^ring- da-ta to and from the cache.- To 
fully exploit the performance potential of fast processors, programmers 
must explicitly consider cache behavior, restructuring their codes to 
increase locality. As these fast processors proliferate, techniques for 
improving cache performance must move beyond the supercomputer and 
multiprocessor communities and into the mainstream of computing. In this 
article, the authors examine some of the techniques programmers can use to 
improve cache performance. They show how to use CProf, a cache profiler, 
to identify cache performance bottlenecks and gain insight into their 
origin. This insight helps programmers understand which of the well-known 
program transformations are likely to improve cache performance. Using 
CProf and simple transformations, they show how to tune the cache 
performance of six of the SPEC92 benchmarks. By restructuring the source 
code, the benchmarks greatly improve cache behavior and achieve execution 
time speedups ranging from 1.02 to 3.46. The speedup depends on the 
machine's memory system, with greater speedups obtained in the Fortran 
programs. Copyright 1994, IEEE. 
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Abstract: Massively parallel processors have begun using commodity 

operating systems that support demand-paged virtual memory. To evaluate 
the utility of virtual memory, the authors measured the behavior of 
seven shared memory parallel application programs on a simulated 
dis tributed-shared-memory machine. The results (1) confirm the 
importance of gang CPU scheduling, (2) show that a page-faulting 
processor should spin rather than invoke a parallel context switch, (3) 
show that the parallel programs frequently touch most of their data, 
and (4) indicate that memory, not just CPUs, must be gang 
scheduled''. Overall, the experiments demonstrate that demand paging 
has limited' value on current parallel machines because of the 
applications ' synchronization and memory reference patterns and the 
machines' high page-fault and parallel-context-switch overheads. 
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Abstract: Recent distributed shared memory (DSM) systems and proposed 
shared-memory machines have implemented some or all of their cache 
coherence protocols in software. One way to exploit the flexibility of 
this software is to tailor a coherence protocol to match an 
application's communication patterns and memory semantics. This paper 
presents evidence that this approach can lead to large performance 
improvements. It shows that application-specific protocols 
substantially improved the performance of three application 
programs — appbt, em3d, and barnes--over carefully tuned transparent 
shared memory implementations. The speed-ups were obtained on Blizzard, 
a fine-grained DSM system running on a 32-node Thinking Machines CM-5. 

Major Descriptors: MEMORY MANAGEMENT — PERFORMANCE; * SUPERCOMPUTERS -- 
MEMORY MANAGEMENT 

Descriptors: COMPUTER CODES; DATA TR7\NSMISSI0N; DISTRIBUTED DATA PROCESSING 
; PARALLEL PROCESSING; SYNCHRONIZATION 

Broader Terms: COMMUNICATIONS; COMPUTERS; DATA PROCESSING; DIGITAL 
COMPUTERS; PROCESSING; PROGRTU^ING 

Subject Categories: 990200* — Mathematics & Computers 
.Author (s) : Hill, M.D ... 

. . • Wood, D.A. (Univ. of Wisconsin, Madison, WI (United States) . Computer 
Sciences Dept.) 

Major Descriptors : MEMORY * MANAGEMENT — * PERFO'RMANCE . . . ' ' 
...SUPERCOMPUTERS — MEMORY MANAGEMENT 



4/5,K/24 (Item 3 from file: 103) 

DIALOG (R) File 103: Energy SciTec 

(c) 2004 Contains copyrighted material . All rts. reserv. 
02958383 NOV-90-037043 ; EDB-90-175627 

Title: Cache cconsiderations for multiprocessor programmers 

Author(s): Hill, M.D. ; Larus, J.R. (Computer Sciences Dept., Univ. of 

Wisconsin, Madison, WI (US 
Source: Communications of the ACM (Association of Computing Machinery) 

(USA) v33:8. Coden : CACMA ISSN: 0001-0782 
Publication Date: Aug 1990 
p 97-102 



Contract Number (Non-DOE) : MIPS-895727 8 ; CCR-8902536 

Document Type: Journal Article 

Language: In English 

Journal Announcement: EDB9023 

Subfile: ETD (Energy Technology Data Exchange). NOV (DOE contractor) 
US DOE Project/NonDOE Project: NP 
Country of Origin: United States 
Country of Publication: United States 

Abstract: Although caches in most computers are invisible to programmers, 
they -significantly affeot p-rogram perf orma-nce-. -This is • particularly 
true for cache-coherent, shared-memory multi-processors. This article 
presents recent research into the performance of parallel programs and 
its implications for programmers. 
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Abstract: This paper describes a memory management unit and a cache 
controller (MMU/CC) for a shared memory multiprocessor. The MMU/CC 
implements a novel memory laanagement scheme, called in- cache 
address translation, that does not require a translation lookaside 
buffe-r (TLB). It also implements ■ a --snooping - but protocol to maintain- 
data consistency across all caches in the system. Both chips are 
implemented in a 1.6-{mu}m double-layer-metal CMOS technology, and are 
being used in a multiprocessor workstation (SPUR) successfully 
executing a UNIX-like network-based operating system called Sprite as 
well as many applications including LISP programs. 
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La technologie CMOS a tres, grande echjelle . d Mntegration permet d*i-ntegrer 
un puissant processeur dans une puce unique. On decrit un circuit realisant 
la fonction de gestion memoire et de commande de 1 ' antememoire pour un 
multiprocesseur a memoire partagee 
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ABSTRACT: The small number of massively parallel, shared-memory machines 
is due to the lack of a shared-memory programming performance model that 
can determine the cost of operations for programmers and the cases that are 
common for hardware designers, which lets them build simple hardware to 
optimize them. The cooperative shared memory approach to shared-memory 
design is described in an attempt to rectify this situation; the initial 
implementation uses a simple' pi^bgrammih'g model' calTed ChecR-In/Check'-Out " 
(CICO) along with even simpler hardware called DirlSW. Programs in CICO 
bracket uses of shared data with a ' check_out ' directive marking the 
expected first use and a 'check_in' directive terminating the expected use 
of the data. Communication latency is hidden with the aid of a cooperative 
prefetch directive. The DirlSW minimal directory protocol adds little 
complexity to message-passing hardware but supports programs written within 
the CICO model efficiently. 
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; Concurrent Programming; Modeling; Memory management ; 

Multiprocessing; Cache memory 
FILE SEGMENT: AI File 8 8 
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Page placement algorithms for large real-indexed caches. (Technical) 
Kessler, R.E.; Hill, Mark D. 

ACM Transactions on Computer Systems, vlO, n4, p338(22) 
Nov, 1992 

DOCUMENT TYPE: Technical ISSN: 0734-2071 LANGUAGE: ENGLISH 

RECORD TYPE: ABSTRACT 

ABSTRACT: Both paged virtual memory and caches are supported in most 
general-purpose computer systems. Most operating systems place pages by 
selecting an arbitrary page frame from a pool of page frames made available 
by the page replacement algorithm. A simple model is developed showing that 
naive, or arbitrary, page placement causes up to 30 percent unnecessary 
cache conflicts. Several page placement algorithms are presented, called 
careful-mapping algorithms, that select a page frame from the pool of ■ 
available page frames that will most likely reduce cache contention. 
Trace-driven simulation shows that dynamic cache misses can be reduced by 
between 10 and 20 percent with careful mapping over naive mapping in a 
direct-mapped real-indexed multimegabyte cache. Careful-mapping algorithms 
are an inexpensive way to improve cache performance when main memory is 
underused because it is easy* to- maintazTi a large available • pool . • • ■ 
SPECIAL FEATURES: illustration; table; chart; graph 

DESCRIPTORS: Page Sizing; Research and Development; Algorithm Analysis; 

New Technique; Cache memory; Memory management ; Virtual memory; 

Operating System 
FILE SEGMENT:' AI File 88 
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Asynchronous transfer mode receiver. (Technical) 

Hill, M. ; Cantoni, A.; Moors, T 
lEE Proceedings Part E Computers and Digital Techniques, vl39, n5, p401(9) 

■Sept, 19^2 • - . .. ... - . . . .... 

DOCUMENT TYPE: Technical ISSN: 0143-7062 LANGUAGE: ENGLISH 

RECORD TYPE: ABSTRACT 

ABSTRACT: Asynchronous cell-based transmission is the preferred 
transmission mode for emerging high-speed network standards such as the 
IEEE 802.6 metropolitan area network standard and the CCITT broadband 
integrated services digital network. These networks are envisaged to 
operate at bit rates in excess of 100 Mbit/s . The high bit rate and the 
cell-based mode of transmission pose challenging requirements on 
memory-buffer management and the reassembly of packets from constituent 
cells. The paper describes hardware architecture and memory - management 
techniques developed to achieve the required packet-reassembly functions 
and buffer- memory management for a node operating in a high speed 
asynchronous-transfer-mode-based network. The paper also discussed a number 
of major generic issues addressed during the development. (Reprinted by 
permission of the publisher.) 
SPECIAL FEATURES: illustration; chart; table 

DESCRIPTORS: Memory Management ; Asynchronous; Communications Modes; 

Packet Switch; Networks 
FILE SEGMENT: -CD File 275 ^ " • . - - . 

Hill, M . . . 

. . .ABSTRACT: memory-buffer management and the reassembly of packets from 
constituent cells. The paper describes hardware architecture and memory - 
management techniques developed to achieve the required packet-reassembly 
functions and buffer- memory management for a node operating in a high 



£peed asynchronous-transfer-mode-based network. The paper also discussed. 
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The perfect marriage? (Roderick Manhattan and Associates ' Draf ix Windows 
CAD software package) (Software Review) (evaluation) 
Wood, David 

3D, n28, p29(l) 
August, 1990 

DOCUMENT TYPE: evaluation ISSN: 0953-2331 LANGUAGE: ENGLISH 

RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 8 07 LINE COUNT: 00065 

ABSTRACT: Roderick Manhattan and Associates' Drafix Windows CAD is an 
easy-to-use computer-aided design software package. The package's features 
include a well-designed users' manual, simple and quick installation 
procedures, straightforward drawing and editing, and a large selection of 
line terminators. The disadvantages of the package include a cramped screen 
and the inability of the Escape key to stop a command. The package requires 
a 286-, 386-, or 486-based PC with 1Mbyte of RAM. The cost is 695 pounds 
sterling. The package's capabilities can be improved by the addition of 
Microsoft Windows 3.0, at a cost of 99 pounds sterling. 
CAPTIONS: 3D verdict, (table) 
SPECIAL FEATURES: illustration; table 

COMPANY NAMES: Roderick Manhattan and Associates — Products 
DESCRIP'TORS : Graphics Software; Evaluation; "Computer- Aided Design 
SIC CODES: 7372 Prepackaged software 

TRADE NAMES: Drafix Windows CAD (CAD software) — evaluation 

OPERATING PLATFORM: Intel 80286; Intel 80386; Intel 80486; MS Windows 
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Wood , David 

the ultimate success of the package will depend on the performance 
of the new environment, its* improved memory management , however, means 
extended memory above iMbyte, and so faster file loading, zooms and 
redraws . 

In general Drafix... 
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Port Windows applications to OS/2 (almost) painlessly with the Software 
Migration Kit" 

Fogelin, Eric; Wood, David ; Bergman, Noel 
Microsoft Systems Journal, v5, n6, p21(10) 
Nov, 1990 

ISSN: 0889-9932 LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 4150 LINE COUNT: 00332 

ABSTRACT: Microsoft Corp's Microsoft Windows to OS/2 Software Migration 
Kit (SMK) is a set of software tools that makes porting applications from 
the Microsoft Windows graphical environment to OS/2 Presentation Manager 
(PM) much easier. SMK acts as an extension to the Microsoft Windows 
Software Development Kit Version 3.0 (SDK). SMK reduces conversion time for 
large applications from months to days with its mapping layer that 
translates Windows function calls into OS/2 function calls. The code layer 
is implemented as a set of OS/2 dynamic-link libraries (DLLs). It sits 
between PM and the Windows application and translates calls to the Windows 



application program interface (API) at run time. The mapping layer accepts 
the Windows API call, interprets it and reorders and converts the 
parameters. It then calls the corresponding PM API. Details on using the 
SMK are presented. 

CAPTIONS: Software Migration Kit mapping layer, (chart); Beta SMK 
contents, (chart); Windows functions unsupported in SMK. (chart) 

SPECIAL FEATURES: illustration; chart 
COMPANY NAMES: Microsoft Corp. — Products 

DESCRIPTORS: Software Migration; Application Development Software; Coding 
; GUI 

TICKER SYMBOLS: MS FT 

TRADE NAMES: Microsoft Windows (GUI ) --Programming; Microsoft Windows to 

OS/2 Software Migration Kit (Program development software) --Usage 
OPERATING PLATFORM: OS/2 PM; MS Windows 
FILE SEGMENT: CD File 275 

. . . Wood, David 

, . , Windows API functions are supported in the SMK mapping layer, with 

the execution of sound, 32-bit memory management r some Graphic Device 
Interface (GDI) functions, and a few other functions (see Figure 8) . These 
functions must... 
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Evaluating associativity in CPU caches, (technical) 

Hill, Mark D. ; Smith, Alan Jay 
IEEE Transactions on Computers, v38, nl2;- pl612(19) 
Dec, 1989 

DOCUMENT TYPE: technical ISSN: 0018-9340 LANGUAGE: ENGLISH 

RECORD TYPE: ABSTRACT 

ABSTRACT: Cache memories are generally designed to be direct-mapped or 
set-associative as large fully-associative caches are usually infeasible 
and/or too expensive. Efficient new algorithms for the simulation of 
alternative direct-mapped and set-associative caches are presented, and the 
uses of those algorithms to quantify the effect of limited associativity on 
the cache miss ratio are described. Three important cache parameters are 
cache size, block/line size and associativity, the last being emphasized 
here . 

CAPTIONS: Set-associative mapping, (chart); Data on traces, (table); Miss 
ratios for five-trace workload with caches of four associativities, (graph) 

SPECIAL FEATURES: illustration; chart; table; graph 

DESCRIPTORS: Cache Memory; Algorithm Analysis; Processor Architecture; 
Associative Memory; Simulation of Computer Systems; CPU; Memory- 
Management ; Research and Development; Memory Mapping; New Technique 

FILE SEGMENT: "aI File 88 * 

Hill , Mark D . . . 

...DESCRIPTORS: Memory Management ; 



4/5,K/34 (Item 1 from file: 647) 

DIALOG (R) File 647: CMP Computer Fulltext 
(c) 2004 CMP Media, LLC.XAII rts^^eserv. 

01116582 CMP ACCESSION NUKRER: EET19970120S0074 
Euphony: a signal processoir for ATM 

Peter Z. Onufryk, Princip^ Technical Staff Member, Consumer Electronics, 

Research Department ^^AT&T La'bs , Murray Hill, N.J. 
ELECTRONIC ENGINEERING TIMES, 199X^ n 937, PG54 



Set Items Description 

51 72823 (TRANSLATION OR TABLE) () (LOOKASIDE OR LOOK () ASIDE) () BUFFER? 

OR TLB OR MAP OR MAPPING OR MAPS OR MAPPED 

52 1003101 CONTEXT OR CURRENT () STATUS OR CONDITION OR MODE 

53 369497 VALID? OR AUTHENTICAT? OR VERIF? OR CERTIF? OR IDENTIF? 

54 1772779 MEMORY OR STORAGE OR CACHE? OR BUFFER? 

55 1444 SI AND S2 AND S3 

56 17502 S2 (3N) S4 

57 279 SI AND S6 ' - 

58 2168 S3 (2N) (FLAG? OR INDICATOR? OR POINTER?) 

59 3 S7 AND S8 

510 26 S7 AND S3 

511 26 S9 OR SIO 

512 17 Sll AND IC=G06F? 

513 2 S5 AND (DEMAPPING OR DEMAP OR DEMAPS OR DEMAPPED) 

514 1 S13 NOT Sll 

515 27 311 OR S14 

516 3 S15 AND MC= (T01-F05E? OR T01-H01B3 OR T01-S03) 

517 18 S12 OR S14 OR S16 



File 347:JAPIO Nov 197 6-2 003/Nov (Updated 040308) 

(c) 2004 JPO & JAPIO 
File 350:Derwent WPIX 1963-2004/UD, UM &UP=200419 

(c) 2004 Thomson Derwent 



17/5/1 (Item 1 from file: 347) 

DIALOG (R) File 347 : JAPIO 

(c) 2004 JPO & JAPIO. All rts. reserv. 

06947911 **Image available** 
COMPUTER SYSTEM 

PUB. NO.: 2001-175463 [ JP 2001175463 A] 

PUBLISHED: June 29, 2001 (20010629) 
INVENTOR (s): YAGI TATSUO 

APPLICANT (s) : MATSUSHITA ELECTRIC IND CO LTD 
APPL. NO.: 11-357769 [ JP 99357769] 

FILED: December 16, 1999 (19991216) 

INTL CLASS: G06F-009/06 ; G06F-009/445 ; G06F-012/06 

ABSTRACT 

PROBLEM TO BE SOLVED: To provide a computer system for providing a boot 
program area, without suppressing program area by using a relatively 
smaller-scaled microcomputer, and validly utilizing a memory space. 

SOLUTION: This computer system is provided with a control means 1 for 
executing an instruction according to program data, a rewritable first 
memory means 8 for storing a boot program, a rewritable second memory means 
9 for storing an operation program, a mode switching means 7 for selecting 
either a boot mode or an operation mode r and a memory map converting 
means 5 for switching the address of the first and second memory means 
according to the mode selected by the mode switching means . The memory 

map converting means sets the address of the first and second memory 
means, so that the operation program can be written in the second memory 
means according to the boot program in the boot mode, and sets the address 
of the first and second memory means, so that a prescribed instruction can 
be executed by a control means according to the operation program in the 
operation mode. 

COPYRIGHT.: (C) 2001, JPO „ . . . . 
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DEVICE, SYSTEM, AND METHOD FOR PRINT PROCESSING AND COMPUTER- READABLE 
RECORDING MEDIUM WHERE PROGRAM ALLOWING COMPUTER TO IMPLEMENT SAME METHOD 
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PUB. NO.: 2000-194518 [JP 2000194518 A] 

PUBLISHED: July 14, 2000 (20000714) 
INVENTOR (s) : NISHINOSONO MICHIAKI 
APPLICANT (s) : RICOH CO LTD 
APPL. NO.: 10-376251 [JP 98376251] 

FILED: December 24, 1998 (19981224) 

INTL CLASS: G06F-003/12 ; B41J-029/46 

. ABSTR^wCT ......... 

PROBLEM TO BE SOLVED: To efficiently print only an altered page, to reduce 
a waste of printing papers, and to save resources by comparing new print 
data with print data corresponding to identification information included 
in a print request stored in a storage means, page by page, and printing 
only the page wherein the new print data is detected. 

SOLUTION: A control part 127 performs a normal print process when mode data 
received from a computer 110 together with print data indicates 'normal 
print mode' and performs the normal print processing and also stores data 
expanded into a bit map in a print data storage part 124 when it is a ' 
storage print mode When the data indicates a difference print mode', 
only print data (bit map data) of a page having different contents 



detected by a comparing process part 125 is outputted to a print part 126 
and the print data stored in the print data storage part 124 are updated 
with the print data of the page having the different contents. 

COPYRIGHT: (C) 2000, JPO 
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01-263851 [JP 1263851 A] 
October 20, 1989 (19891020) 
TAKAGI KATSUAKI 

HITACHI LTD [000510] (A Japanese Company or Corporation), JP 
( Japan ) 

63-091550 [JP 8891550] 
April 15, 1988 (19880415) 
[4] G06F-012/10 ; G06F-012/12 

45.2 (INFORMATION PROCESSING -- Memory Units); 42.4 
(ELECTRONICS — Basic Circuits) 

Section: P, Section No. 990, Vol. 14, No. 21, Pg. 122, 
January 17, 1990 (19900117) 



ABSTRACT 

PURPOSE: To simplify the constitution of a circuit which is used for 
control of the replacement of entries by adding a using condition bit 
control circuit to the outside of a memory array to prepare a using 
condition bit within the memory array and to apparently turn forward or 
reverse the using condition bit. 

CONSTITUTION: An address converting device ( TLB ) 1 contains a mask 
circuit 11, an associative array 12, a coincident line processing circuit 
13, and a data array 14. The array 12 includes an LA field 123 which holds 
a valid bit 122, a using condition bit 121 and a part (to undergo the 
address conversion) of a logical address. A maslc pattern is produced by a 
mask pattern generating circuit 15 under the control of a control signal 
34. A using condition bit control circuit consisting of a flip-flop 21 and 
an EOR gate 22 turns apparently forward or reverse the contents of the bit 
121. 
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Virtual memory control method in computer system, involves setting 
translation entry mapping, indicator, for. each entry .associated with 
given context of memory and demapping given context by changing set 
mapping indicator 

Patent Assignee: CASSIDAY D R (CASS-I); FEEHRER J R (FEEH-I); HILL M D 

(HILL-I); JACKSON C J (JACK-I); OSTROVSKY B (OSTR-I); PILLAI P (PILL-I); 
WOOD D A (WOOD- I) 

Inventor: CASSIDAY D R; FEEHRER J R; HILL M D; JACKSON C J; OSTROVSKY B; 

PILLAI P; WOOD D A 
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Abstract (Basic) : US 20030070058 Al 

NOVELTY - A translation entry mapping indicator (132) is set for 
each entry associated with the given context (134) of memory and a 
validity flag (130) is set for each entry associated with the given 
context. The given context is demapped by changing the mapping 
indicator set for each given context, 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
memory management device; and program for controlling virtual memory in 
computer system. 

USE " For controlling physical .memory , of computer system. 

ADVANTAGE - The time required to demap the given context is reduced 
without degrading the performance of the computer system. Thereby the 
memory management process is performed efficiently. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
memory management unit. 

context mapping indicator (122) 

clean-up indicator (124) 
validity flag (130) 

translation entry mapping indicator (132) 

context (134) 
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Real-time access providing apparatus for flash memory device in cellular 
phone, includes control logic which accesses flash memory data, when 
flash condition is encountered 

Patent Assignee: GARNER R P (GARN-I); INTEL CORP (ITLC ) 

Inventor: GARNER R P; GARNER R 

Number of Countries: 100 Number of Patents: 003 
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Abstract (Basic) : US 20020136078 Al 



NOVELTY - A memory mapped input/output unit (12) is used to 
access flash memory data, when a particular condition is encountered. 
The condition is not a flash read command instruction. A control logic 
accesses the flash memory data, when the flash condition is 
encountered . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for the 
following : 

(1) Flash memory device; and 

(2) Real-time access provision method. 

USE - For providing real-time access to flash memory device such as 
electrically erasable programmable read only memory used in cellular 
phone . 

ADVANTAGE - As the need to switch between modes to access flash 
data ..while performing another functj-on is ..eliminated, the performance 
and speed are improved. As the memory mapped input/output registers 
hold various parameters needed to identify features of the flash 
device, the real-time access to flash memory is provided efficiently. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
logical mapping of flash configuration plane to memory mapped 
input/output portion. 

Input/output unit (12) 

pp; 14 DwgNo 1/6 
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Transmission apparatus e.g. ATM apparatus in communication network, maps 
identified extended cell which is output from controller onto payload 
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Abstract (Basic) : US 20020065073 Al 

NOVELTY - The multiplexing and demapping unit demultiplexes and 
demaps the extended ATM cells from a payload of frame signal received 
from a transmission path, A cell synchronizer executes a cell 
synchronization processing to identify an extended cell boundary. The 
multiplexing and mapping unit multiplexes and maps an extended cell 
which is output from a controller onto a payload of another frame 
signal for transmission to the transmission path. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for the 
following : 

(1) Communication network; 

(2) SDH transmission apparatus; 

(3) Extended cell communication network; and 



(4) Add/drop multiplexing apparatus. 

USE - E.g. asynchronous transfer mode (ATM) transmission 
apparatus, synchronous digital hierarchy (SDH) transmission apparatus 
(claimed), add/drop multiplexer (claimed), ADM transmission apparatus 
in communication network, (claimed) ,jwith. extended .cell communication 
network, ATM cell transmission network, SONET/SDH network, local area 
network (LAN) , wide area network (WAN) , IP network, photonic network 
e.g. OADM, WDM network. 

ADVANTAGE - Extended ATM cell apparatus can be transmitted at high 
speed through an SDH transmission apparatus and makes it possible to 
enhance the performance of an SDH transmission apparatus. The route 
status of a bypass virtual path can be monitored at all times, thereby 
making it possible to provide for change out of a virtual path, when a 
failure occurs . 

DESCRIPTION OF DRAWING (S) - The figure shows the extended cell 
communication network, 
pp; 52 DwgNo 1/40 
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File recording method involves fo arming cluster by recording file of 
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Patent Assignee: SONY CORP (SONY ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 2001043112 A 20010216 JP 99217133 A 19990730 200216 B 

Priority Applications (No Type ,J)ate) : JP 99217133. A 19990730 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 200IQ43112 A 151 G06F-012/00 

Abstract (Basic) : JP 2001043112 A 

NOVELTY - Cluster (UCl) is formed by recording file of arbitrary 
recording length (Lnl) at arbitrary positions (adlO) of the memory 
section (MemI). Data recording positional information (CAdT) showing 
the recording position of the cluster is recorded at a fixed position 
(adC) and protection information (Pdv, Pdw) for every cluster is 
recorded at arbitrary positions (adPLS). 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(a) File reproduction method; 

(b) Data write-in system; 

(c) Data reading system; 

(d) Data recording and reproducing apparatus; 

(e) Data write-in apparatus; 

(f) Data reader; 

{jg) Recording medium. ........... 

USE - For data recording and reproducing apparatus. 

ADV7VNTAGE - Improves memory utilization efficiency by preventing 



the generation of unused recording area and simplifies file control 
operation. Reduces the area required for recording protection 
information. Enables to access the desired cluster reliably. 
Identifies recording position easily and protects file ^ reliably . 
Enables to confirm recording condition on the memory • Simplifies 
acquisition of protection information and improves compatibility with 
memories having different recording capacities. Improves data recording 
and regeneration efficiency and enables to confirm write-in approval at 
arbitrary positions on the recording medium easily. 

DESCRIPTION OF DRAWING (S) - The figure shows the explanatory 
drawing of the memory map of the memory section in which data is 
recorded by a data write-in system. (Drawing includes non-English 
language text) . 

Recording positions (adc,adPLS) 

Positional information (CAdT) 

Protection information (Pdv,Pdw) 

Memory section (Lnl) 

Memory section (Meml) 

Cluster (UCl) 
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Vault controller for securer multiple browser session in e-commerce uses 
context manager to activate Keys for transferring data in storage levels 
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URL \ / 
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Abstract (Basic) : CA 2310535 Al 

NOVELTY - A registration authority (20) reviews Submitted requests 
and the decision is provided to a certification agerH; (29) . The vault 
controller uses browser c/lient authentication and SSL\protocol with 
browser request encrypted and sent to the vault controller software 
along with an access certificate . The vault supervisor validates 
and maps access certificate to a user ID and password. 

DETAILED DESCRIPTION - Embedded in the vault process is a context 
manager that includes data structure for storing global context made up 
of variables and their values, in memory as well as mechanism for 
importing and exporting global context to external encrypted storage 
. An external vault agent (28) running remotely incorporates a subset 
of vault process functionality, enabling agent to exchange secure 



niessages with vault running under the vault controller. The vault 
process is multithreaded with each thread with its own local storage 
area allowing it to maintadLn 'state' across multiple browser-server 
interactions. An INDEPEN-DEN^T CLAIM 'Is aisaT'incl^uded for 'a context 
manager within a vault process for maintafl.ning state information 
between successive user browsex sessions/ 

USE - For secure browsing rn e-commeA:ce. 

ADVANTAGE - It supports the Gxeatioi/, storage and retrieval of data 
for state maintenance of vault pro^^esse's in successive browser 
connection as well as across multipPe units /work and/or separate 
applications. yC 

DESCRIPTION OF DR?VWING(S) - The fagVre shows a representation of a 
vault controller in a secure end-to-ifend \ommunication system 
interacting with users, vault agents and registration authorities, 
pp; 24 DwgNo 1/4 / \^ 
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Abstract (Basic) : EP 797149 A 

The address translation control circuit includes several context 
storage elements. The first one contains a first context number, and a 
second one a second context' number r Circurtry rs coupled to the storage 
elements which outputs a translation Hit signal. 

The Hit signal indicates that the translation look aside 
buffer (308) is currently storing the physical address when the 



context identification number equals a selected context number. The 
equalled selected context number is either the first or the second 
context number and the pre-stored virtual address is equivalent to the 
reque'sted virtual addres's . .......... 

ADVANTAGE - Improves performance of electronic addressing by 
sharing translation table entries of translation look aside 
buffer . 
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Abstract (Basic) : US 5586283 A 

In a computer system comprising a processor and memoryy. accesses to 
memory are performed by issuing a virtual address to memory. An appts . 
for performing a translatidti from a' virtual ' address to ' ^ physical 
address has a translation look aside buffer comprising a page 
table memory comprising a number of levels of a page table, an initial 
level of the page table being identified as a root level, the page 
table memory storing page table pointers (PTPs) which provide a base 
address of a table in a next higher level of a page table and page 
table entries (PTEs) which provide information to translate the virtual 
address to the physical address. A tag memory comprises tags, which 
comprise identification of PTEs and PTPs. The tags also comprise 
virtual PTP tags for PTPs located in at least one predetermined higher 
level that is higher than the root level, and provide a pointer to a 
corresponding entry in the page table. 

A select mechanism is coupled to receive the virtual address and 
context of the memory access, and generates a compare virtual PTP 
tag if a TLB miss occurs when trying to access a tag identifying a 
PTE corresponding to the virtual address. The compare virtual PTP tag 
is generated from the context of the memory address and a 
predetermined portion of the virtual address. The compare virtual PTP 
tag is compared to stored virtual PTP tags stored in the tag memory 
such that if the compared virtual PTP tag and one of the stored virtual 
PTP tags match, the selett 'inechani^tn provides' ^ pointet 'to the corresp. 
PTP at the predetermined higher level of the page table without 
performing a page table walk initiating at the root level through the 
lower level page tables . 



ADVANTAGE - Time expended for performing a page table walk is 
minimized. Decreases latencies caused by accesses to increasing levels 
of page tables during tajDle. walk . ot page table,, by .eliminating time . 
required to walk through root and first levels of page table. 
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Abstract (Basic! : US 5574936. A,. , . . . , 

The appts includes a host ALBID register for storing an 
ART-lookaside buffer (ALB) identifier (ALBID) and an ALBID validity 

indicator for the host mode of a logical processor. A guest ALBID 
register for storing an ALB identifier and an ALB ID validity 
indicator for the most recent guest mode on the logical processor. An 
ALB identifier (ALBID) is generated and stored in the host ALBID 
register and marks valid the ALBID validity indicator in the host 
ALBID register when a host mode is initiated on the logical processor. 
An ALB identifier (ALBID) is generated and stored in the guest ALBID 
register and marks valid the ALBID validity indicator in the 
guest ALBID register when a guest mode is first initiated on the 
logical processor. 

ADVANTAGE - Preserves logical integrity of access register 
translation lookaside buffer across context switches. 
Dwg. 1/11 
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Abstract (Basic) : DE 3832912 A 

The intelligent work station ^oktains a CPU e.g. MC68020, a direct- 
mapped block-organised cache membry\ a buffer memory for writing back 
the cache, a tag memory -(vi-rtual/ address and stratus bits for each cache 
block) , a memory management unit/ (MMuK to assign and reclaim memory 
workspace, a main memory, cache 'hit' (detector logic, cache memory 
flush logic and work station control loigic. Also optionally, a context 
ident (ID) register to identify and pMoritise active processes in 
multi-tasking, cache flush logic to enhance flushing performance, 
direct virtual memory access/ (DVMA) logic\ a multiplexer and a databuss 
buffer to extend from 32 to/64 data bits. 

Addressing is extended to contain bit^ defining segment, page and 
block number as well as physical address, wisth optional context ID 
bits. The operating system kernel of, e.g. UI^X (RTM) , is expanded to 
provide a set of commands to flush specific context, segment and page 
areas of the cache, i.e./ spool contents to mai^memory and release 
cache space for reassignment. Each cache block nas status bits in its 
tag to define write-back data valid and/or modnvfied, two security 
bits, a supervisor's se/curity bit and a Write Permit bit. Cache 'hit' 
occurs when content is I valid i CPU virtual addresSs agrees with that in 
tag memory and, option*aJ-ly, when context ID's a^ree. 

USE/ADVANTAGE - High performance 32 bit multi-user systems. Fast 
processing in cache memory proceeds transparently to user 
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7\bstract (Basic) : EP 282213 A 

The memory management unit (10) with its cache memory (102) and 
mode control (106) stores and retrieves information from a memory 
(12) using virtual address tables (404). It performs translations 
between virtual addresses received in response to instructions from 
sources having the same virtual address mapped to different physical 
addresses . 

The tables (404) contain translation information for each mapped 
pair of virtual and physical addresses. The pair to be utilised in 
translation is selected by the appropriate process (103,104 or 105) in 
accordance with the mode information associated (106) with each virtual 
address . 

USE/ADVANTAGE - In multiple concurrent processing. Interprocess 
data transfer is facilitated with no need to construct temporary 
mapping tables or to switch memory management unit back and forth 
among processes in course of execution, readout and writing. 

1/5 

Title Terms: CONCURRENT; CONTEXT; MEMORY; MANAGEMENT; UNIT; MULTIPLE; 

CONCURRENT; MAP ; TABLE; IDENTIFY ; TRANSMISSION; BIT; DATA; TRANSFER; 

OPERATE 
Derwent Class: TOl 

International Patent Class (Main) : G06F-012/10 

International Patent Class (Additional) : G06F-009/46 ; G06F-012/14 

File Segment: EPI 



17/5/16 (Item 13 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 
(c) 2004 Thomson Derwent. All rts . reserv. 

004701137 X 

WPI Acc No: 1986-204479/r98631 / 

XRPX Acc No: N86-152765 . \ / 

High lighting and classifying sesgments on CRT display - having segments 
located in memory to satxlsfy /matching condition so operator can 
direct terminal to blink \ / 

Patent Assignee: TEKTRONIX INC f^EKT ) 

Inventor: DALRYMPLE J C; MAYNARDNJ H; PAUL B G 

Number of Countries: 001 Number OSf Patents: 001 

Patent Family: j \ 

Patent No Kind Date Applica^ No Kind Date Week 

US 4601021 A 19860715 qs 84684\62 A 19841219 198631 B 



Priority Applications 

19841219 
Patent Details: 



(No Type Date) : US 82367525 A 19820412; 



US 84684962 A 



Set 


Items 


Description 


SI 


5446 


(TRANSLATION OR TABLE) {) (LOOKASIDE OR LOOK () ASIDE) () BUFFER? 
OR TLB OR MAP OR MAPPING OR MAPS OR MAPPED 


S2 


3498 


CONTEXT OR CURRENT () STATUS OR CONDITION OR MODE 


S3 


9451 


VALID? OR AUTHENTICAT? OR VERIF? OR CERTIF? OR IDENTIF? 


S4 


8686 


MEMORY OR STORAGE OR CACHE? OR BUFFER? 


35 


28 


SI AND S2 AND S3 


36 


3 


S5 AND 34 


37 


2 


S6 NOT PY>2001 


38 


2 


31 NOT PD>20011009 


File 


256: SoftBase: Reviews , Companies &Prods . 82-2004/Feb 



{c)2004 Info. Sources Inc 



8/5/1 

DIALOG { R) File 256 : Sof tBase : Reviews ^ Companies&Prods . 
(c)2004 Info. Sources Inc. All rts . reserv. 

01712116 DOCUMENT TYPE: Product 

PRODUCT NAME: Virtual-CPU (712116) 

SuKunit Design Inc (574511) 
35 Corporate Dr 

Burlington, MA 01803 United States 
TELEPHONE: (781) 685-4954 

RECORD TYPE: Directory 

CONTACT: Sales Department 

Summit Design's Virtual-CPU is a co- verification environment that allows 
embedded -systems developers -to -analyze -end -validate - interactions between 
hardware and software. Developers can employ Virtual-CPU for system 
architecture validations before hardware implementations are complete. 
The product can be configured for any standard or customized processor or 
bus. Virtual-CPU's execution environment runs embedded system software as 
if it were running on target CPUs. These simulations are linked to logic 
simulations of embedded system hardware. Virtual-CPU can run in host-code 
execution mode on workstations or in target-code execution mode within 
an Instruction Set Simulator (ISS). Embedded system hardware is modeled in 
C, C++, or in a hardware description language. Embedded system software is 
written in C, C++, or in a target processor's assembly language. Employing 
Virtual-CPU, developers can use existing debugging tools. A memory map 
feature allows designers to swap models. 
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Sun Microsystems' Wabi 2.0 (Windows Application Binary Interface) is used 
for running Windows applications on UNIX. The earlier version's performance 
was poor and application support limited. Version 2,0 is now more powerful 



and faster, and supports OLE 2.0. It can run in 386 enhanced mode , and 
uses less memory . However, it still lacks sound support and support for 
the Win32 APIs. The lack of application support has been overcome by 
encouraging users to install Windows 3.1, which is now certified to run 
with Wabi 2.0. There are 24 certified applications, which Sun indicates 
make up over 80 percent of the commercial Windows market. Instead of using 
emulation, Wabi uses API translation. Translation makes use of the X GUI, 
by mapping Windows commands to X commands . Each user must load a copy of 
Windows, which on a multi-user system can consume a substantial amount of 
memory . Performance is better than the earlier release, although still 
somewhat slow. 
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Abstract: A decision to implement an optical storage system raises a 
number of issues, some of which relate directly to the current capabilities 
of technology and some of which are operational. The determinants for 
success include a thorough analysis of existing operations, a clear 
requirements definition, and a careful mapping of the requirements to a 
configuration for the system. Although these steps are materially similar 
to the analysis required for any automation project, they have been less 
than well understood in the context of this storage technology. This 
paper will identify and discuss these tasks as they relate to the design 
of a complete system, using case studies to highlight the effects of work 
flows, storage algorithms, and communication requirements. In the process, 
the paper distills design experience in optical disk data management 
systems with hundreds of workstations and Terabytes of on-line storage. 
(Author abstract) 
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This dissertation presents the concepts, principles, performance, and 
implementation of input queuing and cell-scheduling modules for the 
Illinois Pulsar-based Optical INTerconnect (iPOINT) input- buffered 
Asynchronous Transfer Mode (ATM) testbed. 

Input queuing (IQ) ATM switches are well suited to meet the 
requirements of current and future ultra-broadband ATM networks. The IQ 



structure imposes minimum memory bandwidth requirements for cell buffering, 
tolerates bursty traffic, and utilizes memory efficiently for multicast 
traffic. The lack of efficient cell queuing and scheduling solutions has 
been a major barrier to build high-performance, scalable IQ-based ATM 
switches. This dissertation proposes a new Three-Dimensional Queue {3DQ) 
and a novel Matrix Unit Cell Scheduler (MUCS) to remove this barrier. 

3DQ uses a linked-list architecture based on Synchronous Random Access 
Memory (SRAM) to combine the individual advantages of per-virtual-circuit 
(per-VC) queuing, priority queuing, and N-destination queuing. It avoids 
Head of Line (HOL) blocking and provides per-VC Quality of Service (QoS) 
enforcement mechanisms. Computer simulation results verify the QoS 
capabilities of 3DQ. For multicast traffic, 3DQ provides efficient usage of 
cell buffering memory by storing multicast cells only once. Further, the 
multicast mechanism of 3DQ prevents a congested destination port from 
blocking other less-loaded ports. The 3DQ principle has been prototyped in 
the Illinois Input Queue (iiQueue) module. Using Field Programmable Gate 
Array (FP-GA) devices, SRAM modules, and* integrated -on a Printed Circuit 
Board (PCB), iiQueue can process incoming traffic at 800 Mb/s. Using faster 
circuit technology, the same design is expected to operate at the OC-48 
rate (2.5 Gb/s) . 

MUCS resolves the output contention by evaluating the weight index of 
each candidate and selecting the heaviest. It achieves near-optimal 
scheduling and has a very short response time. The algorithm originates 
from a heuristic strategy that leads to "socially optimal" solutions, 
yielding a maximum number of contention-free cells being scheduled. A novel 
mixed digital-analog circuit has been designed to implement the MUCS core 
functionality. The MUCS circuit maps the cell scheduling computation to 
the capacitor charging and discharging procedures that are conducted fully 
in parallel. The design has a uniform circuit structure, low interconnect 
counts, and low chip I/O counts. Using 2 $\mu$m CMOS technology, the design 
operates on a 100 MHz clock and finds a near-optimal solution within a 
linear processing time. The circuit has been verified at the transistor 
level by HSPICE simulation. 

During this research, a five-port IQ-based optoelectronic iPOINT ATM 
switch has been developed and demonstrated. It has been fully functional 
with an aggregate throughput of 800 Mb/s. The second-generation IQ-based 
switch is- currently under development, '^iquipped with -iiQueue modules- and ■ 
MUCS module, the new switch system will deliver a multi-gigabit aggregate 
throughput, eliminate HOL blocking, provide per-VC QoS, and achieve 
near-100% link bandwidth utilization. Complete documentation of input 
modules and trunk module for the existing testbed, and complete 
documentation of 3DQ, iiQueue, and MUCS for the second-generation testbed 
are given in this dissertation. 
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Researchers have identified a core set of program transformations 
that are effective in array-based loop nest optimization: these loop 
transformations include interchange, skewing, reversal and tiling. 
Researchers have studied these transformations individually for their 
legality and effect on parallelism and memory hierarchy performance; but 
they have not discussed in any detail how to choose the combination of 



transformations that best optimizes a loop nest. Other researchers have 
taken another approach: they consider each loop nest as a whole, applying 
an elegant matrix theory of loop nest transformation, but one that is only 
applicable to a limited class of loop nests, those whose dependences can be 
expressed as distance vectors. In this limited context , the problems of 
memory hierarchy improvement and parallelization are simplified, but their 
approach has not been extended to apply to general loop nests. 

We have combined the elegance of the matrix theory with the generality 
of general dependence vectors into a new theory of loop transformation. 
This theory has enabled us to apply an algorithmic approach to solving 
optimization goals. Using this theory, we have developed efficient 
algorithms for the compiler to use to improve memory hierarchy utilization 
and parallelism of general loop nests. The parallelization improving 
algorithm maximizes the degree of parallelism within a loop nest, at either 
a coarse or fine granularity. The locality improving algorithm uses the 
same theory, and also reuse information about array accesses within loop 
nests, to' guide'the t rans f orlnatlon proi^ess . ' The pai^all^liz^tion and 
locality improvement algorithms are unified so that locality and 
parallelism can be improved simultaneously without significantly reducing 
either . 

We have implemented versions of these algorithms in Stanford's SUIF 
compiler and performed experimentation on the Perfect Club and the NASA 
kernels. We have found compiler locality improvement to significantly 
improve performance when applicable. We have also demonstrated a tremendous 
sensitivity of performance on tile size for tiled codes on machines with 
direct- mapped or low set associativity caches. 
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Abstract: Many optimization procedures presume the availability of an 
initial approximation in the neighborhood of a local or global optimum. 
Unfortunately, finding a set of good starting conditions is itself a 
nontrivial proposition. The authors describe a procedure for identifying 
approximate solutions to constrained optimization problems. Recurrent 
neural network structures are interpreted in the context of linear 
associative memory matrices, A recurrent associative memory (RAM) is 
trained to map the inputs of closely related transportation linear 
programs to optimal solution vectors. The procedure performs well when 
training cases are selected according to a simple rule, identifying good 
heuristic solutions for representation tests cases. Modest inf easibilities 
exist in some of these estimated solutions, but the basic variables 
associated with true optimums are usually apparent. In the great majority 
of cases, rounding identifies the true optimum. (18 Refs) 
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Abstract: Most multichannel data acquisition systems for digital 
measurements and control applications use a sequential data conversion 
method. Thus, the sampling instants of the analog signals are dispersed in 
time. However, synchronous sampling is desirable when the data are required 
for system identification studies, or when fast data conversion of a 
large number of analog channels is involved. A cost-effective technique is 
proposed for a microcomputer-based 8-channel synchronous data acquisition 
system. The software routine for complete data conversion and for storing 
the values in RAM takes only 93 mu s of CPU time when used with a 
single-board microcomputer SDK-85. Since the A/D converters (ADCs) are 
operated in a memory - mapped mode , the system can be expanded almost 
indefinitely. The circuit can handle ADCs of 12-bit resolution as well. { 
10 Refs) 
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Abstract: Currently cache hierarchies are indexed in parallel with a TLB 
but their tags are part of the physical address so that the memory 
hierarchy is physically addressed. This design faces problems as more 
concurrency is exploited in the processor core and as the memory demand of 
emerging applications is growing fast. The traditional TLB does not scale 
well inside the processor core and its hit rate can be poor for 
data-intensive applications or scientific applications without much 
locality. At the same time, given current trends towards computing in 
memory and in communication interfaces, virtual addresses are needed not 
just inside the processor but throughout the memory hierarchy. These 
observations have prompted us to revisit the problem of moving virtual 
address translation away from the processor. This paper introduces new 
ideas to enable the use of virtual addresses throughout the memory 
hierarchy. The major idea is the replacement of the TLB with a small 
Synonym Lookaside Buffer (SLB), which scales well because its size depends 
on the number of synonyms, and not on the size of the application or of 
the physical memory. We also characterize synonym usage, evaluate the 
amount of cache and SLB flushing due to remapping of addresses, and 
compare the miss rate of various virtual/physical cache organizations for 
several application domains. These evaluations show that 

virtually-addressed memory hierarchies overall have better performance 
behavior than physically-addressed memory hierarchies. Finally, we also 
show how virtually-addressed memory hierarchies facilitate natural, 
scalable multiprocessor extensions, as well as computing-in-memory in the 
context of general-purpose computers. (Author abstract) 42 Refs. 
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Abstract: Online superpage policies were evaluated in the context of 
the impulse memory controller. It was found that the presence of impulse 
changes the tradeoffs in choosing an appropriate policy and that the most 
aggressive policy becomes desirable. (Edited abstract) 4 Ref s . 
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Abstract: In this paper, we present a new address translation and memory 
protection model to manage the wide 64-bit virtual address space, called 
the segment-based translation and protection (SBTP) model. It partitions a 
64-bit virtual address space into 2**3**2 segments with equal size of 
2**3**2 bytes. The SBTP model maintains a segment table to record used 
segments for each process. As a result of caching the per-process basis 
segment table on a designed memory cache, called the segment look-aside 
buffer (SLB), the virtual address translation time and protection rights 
verification time can be reduced. Furthermore, by separating the hardware 
mechanisms of address translation and protection, mapping information 
stored in the translation look - aside buffer (TLB) can be shared by 
all the processes and need not be flushed on each context switch. Thus, 
the cost of context switching compared with that conventional 
architectures is greatly reduced. Simulation results show tliat the proposed 
memory architecture effectively improves the performance of wide virtual 
address translation and memory protection for single address space 
operating systems. (Author abstract) 17 Refs. 
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/abstract: In modern processors, the dynamic translation of virtual 
addresses to support virtual memory is done before or in parallel with the 
first-level cache access. As processor technology improves at a rapid pace 
and the working sets of new applications grow insatiably the latency and 
bandwidth demands on the TLB ( Translation Lookaside Buffer ) are 
getting more and more difficult to meet. The situation is worse in 
multiprocessor systems, which run larger applications and are plagued by 
the TLB consistency problem. We evaluate and compare five options for 
virtual address translation ,in _the context. Q.f CQliAs (Cacb^ Only. Memory . 
Architectures) . The dynamic address translation mechanism can be located 
after the cache access provided the cache is virtual. In a particular 
design, which we call V-COMA for Virtual COMA, the physical address concept 
and the traditional TLB are eliminated. While still supporting virtual 
memory, V-COMA reduces the address translation overhead to a minimum. 
V-COMA scales well and works better in systems with large number of 
processors. As a machine running on virtual addresses, V-COMA provides a 
simple and consistent hardware model to the operating system and the 
compiler, in which further optimization opportunities are possible. (Author 
abstract) 33 Ref s . 
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Abstract: Software managed translation look - aside buffers (TLBs) 
provide free grain address translations at a virtual page. This paper 
presents a novel scheme that' exploits the aforementioned prdperty to allow 
a group of processes to share system resources of write-protected memory 
segments. These resources consist of segment and address mapping data 
structures, and software translation cache and translation look - aside 

buffer entries. While this feature reduces allocation of kernel memory, 
it also better utilizes address translation caches by coalescing multiple 
entries into a single entry. The idea of virtual sharing is complicated by 
two issues. First, the virtual memory sub-system may not map shareable 
segments at the same virtual addresses and access permissions for all 
participants. We present a simple and flexible VM policy which enforces 
this requirement in the presence of dynamic linking and permission changes. 
Second, the underlying hardware address translation architecture may not 
explicitly support group sharing. In this work, we propose a Group Context 
TLB architecture which allows the system to share TLB entries of virtually 
shared pages. We also presents a judicious software multiplexing mechanism 
that enables the operating system to share software address translation 
mappings independent of the underlying hardware characteristics . (Author 
abstract) 11 Refs. 
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Abstract: The TLB ( Translation Lookaside Buffer ) miss services have 
been concealed from operating systems, but some new RISC architectures 
manage the TLB in software. Since software-managed TLBs provide flexibility 
to an operating system in page translation, they are considered an 
important factor in the design of microprocessors for open system 
environments. However, software-managed TLBs suffer from larger miss 



penalty than hardware-managed TLBs, since they require more extra context 
switching overhead than hardware-managed TLBs. This paper introduces a new 
technique for reducing the miss penalty of software-managed TLBs by 
prefetching necessary TLB entries before being used. This technique is not 
inherently limited to specific applications. The key of this scheme is to 
perform the prefetch operations to update the TLB entries before first 
accesses so that TLB misses can be avoided. Using trace-driven simulation 
and a quantitative analysis, the proposed scheme is evaluated in terms of 
the miss rate and the total miss penalty. Our results show that the 
proposed scheme reduces the TLB miss rate by a factor of 6% to 77% due to 
TLB characteristics and page sizes. In addition, it is found that reducing 
the miss rate by the prefetching scheme reduces the total miss penalty and 
bus traffics in software-managed TLBs. (Author abstract) 21 Refs. 
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Abstract: This paper presents the results of a simulation-based study of 
various translation lookaside buffer (TLB) architectures, in the 
context of a modern VLSI RISC processor. The simulators used address 
traces, generated by instrumented versions of the SPECmarks and several 
other programs running on a DECstation 5000, The performance of two-level 
TLBs and fully-associative TLBs were investigated. The amount of memory 
mapped was found to be the dominant factor in TLB performances . Small 
first-level FIFO instruction' TLBs can b*e effective ''in two l^vel TLB • • 
configurations. For some applications, the cycles-per-ins truction (CPI) 
loss due to TLB misses can be reduced from as much as 5 CPI to negligible 
levels with typical TLB parameters through the use of variable-sized pages. 
(Author abstract) 12 Refs. 

Descriptors: ^Computer simulation; Computer architecture; Program 
processors; Computer systems programming; VLSI circuits 

Identifiers: Translation lookaside buffer (TLB); Address traces; 
Cycles per instruction (CPI) 

Classification Codes: 

714.2 (Semiconductor Devices & Integrated Circuits) 

722 (Computer Hardware); 723 (Computer Software); 714 (Electronic 
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02076721 E.I. Monthly No: EIM8 603-014085 
Title: OBJECT ORIENTED ARCHITECTURE. 

Author: Dally, William J.; Kajiya, James T. 

Corporate Source: California Inst of Technology, Pasadena, CA, USA • 
Conference Title: Conference Proceedings - 12th Annual International 

Symposium on Computer Architecture. 

Conference Location: Boston, -MA, USA*- Conference Date: 19850717 

Sponsor: IEEE Computer Soc, Technical Committee on Architecture, Los 

Alamitos, CA, USA.; ACM, Special Interest Group on Architecture, New York, 

NY, USA. ; IEEE, New York, NY, USA. 
E.I. Conference No.: 07650 

Source; Conference Proceedings - Annual Symposium on Computer 
Architecture 12th. Publ by IEEE, New York, NY, USA Available from IEEE 
Service Cent (Cat n 85CH2144-4) , Piscataway, NJ, USA p 154-161 

Publication Year: 1985 

CODEN: CPAADU ISSN: 0149-7111 ISBN: 0-8186-0634-7 
Language: English 

Document Type: PA; (Conference Paper) 
Journal Announcement: 8603 

Abstract: A new machine architecture for high-performance execution of 
late-binding object-oriented languages is proposed. The two principal 
mechanisms for attaining this goal are a fast context allocation/access 
scheme and an instruction translation lookaside buffer . The concept 
and implementation of abstract instructions, using floating point addresses 
to solve the small-object problem, and a novel context allocation/access 
mechanism are discussed. 19 refs. 

Descriptors: ^COMPUTER ARCHITECTURE; -COMPUTER PROGRAMMING -LANGUAGES — ■ 
Problem Orientation 

Identifiers; ALLOCATION/ACCESS SCHEMES; FLOATING POINT ADDRESSES 
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Author: Anon 
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Publication Year: 1985 

CODEN: IBMTAA ISSN: 0018-8689 

Language : ENGLISH 

Document Type: JA; (Journal Article) Treatment: A; (Applications) 
Journal Announcement: 8508" " 

Abstract: This article describes the dual purpose use of translation 
look - aside buffer (TLB) registers that eliminate the need for an 
address register, which is normally used only to store the current address 
during an error condition . The TLB real registers are used as dual 
purpose registers. 

Descriptors: *DATA STORAGE, DIGITAL 

Identifiers: TLB REGISTERS; TRANSLATION LOOK - ASIDE BUFFER (TLB) 
Classification Codes: 

721 (Computer Circuits & Logic Elements) ; 722 (Computer Hardware) 
72 (COMPUTERS & DATA PROCESSING) 



10/5/10 (Item 1 from file: 2) 

DIALOG ( R) File 2 : INSPEC 



(c) 2004 Institution of Electrical Engineers, All rts. reserv. 

7220203 INSPEC Abstract Number: C2 002- 04- 612 0-04 0 
Title: Further cache and TLB investigation of the RAMpage memory hierarchy 

Author (s) : Machanick, P.; ^Patel, 2 .■''"'* ' 

Author Affiliation: Sch. of Comput . Sci., Univ. of the Witwatersrand, 
Wits, South Africa 

Conference Title: Hardware, Software and Peopleware. South African 
Institute of Computer Scientist and Information Technologists Annual 
Conference p. 225 

Editor (s): Renaud, K. ; Kotze, P.; Barnard, A. 

Publisher: Unisa Press, Pretoria, South Africa 

Publication Date: 2001 Country of Publication: South Africa xix+226 
pp. 

ISBN: 1 86888 195 4 Material Identity Number: XX-2001-02851 

Conference Title: Proceedings of SAICSIT 2001. South African Institute of 

Computer Science and Information Technology Annual Conference 

Conference Date: 25-28 Sept. 2001 Conference Location: Pretoria, South 

Africa 

Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

/Abstract: Summary form only given. The RAMpage memory hierarchy is an 
alternative to the traditional division between cache and main memory: main 
memory is moved up a level and DRAM is used as a paging device. Earlier 
RAMpage work has shown that* the RAMpage ' model' scales up 'better with the' 
growing CPU-DRAM speed gap, especially when context switches are taken on 
misses. This paper investigates the effect of more aggressive first-level 
(LI) cache and translation lookaside buffer (TLB) implementations, 
with other parameters kept the same as in previous work, to illustrate that 
a more aggressive design improves the competitiveness of RAMpage. The more 
aggressive LI shows an increase in the advantage of RAMpage with context 
switches on misses, supporting the hypothesis that a more aggressive LI 
favours RAMpage. However, results without context switches on misses are 
less conclusive. A larger TLB, as predicted, makes RAMpage viable over a 
wider range of page sizes. 
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Author Affiliation: Dept. of Comput. Sci., Utah Univ., Salt Lake City, 
UT, USA 

Journal: Performance Evaluation Review Conference Title: Perform. Eval. 
Rev. (USA) vol.28, no . 1 p. 114-15 

Publisher: ACM, 

Publication Date: June 2000 Country of Publication: USA 

CODEN: PEREDN ISSN: 0163-5999 
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Conference Title: ACM SIGMETRICS '2000. International Conference on 
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Conference Date: 17-21 June 2000 Conference Location: Santa Clara, CA, 
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Language; English Document Type: Conference Paper (PA); Journal Paper 
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Treatment: Practical (P) 

Abstract: The amount of data that a typical translation lookaside 

buffer (TLB) can map has not kept pace with the growth in cache sizes and 
application footprints. As, a. result, . th^ cost. -of handling TLB misses- limits- 
the performance of an increasing number of applications. The use of 
superpages, multiple adjacent virtual memory pages that can be mapped with 
a single TLB entry, extends a TLB's reach without significantly increasing 
its size or cost. Previous studies have shown that simple online policies 
that decide to create superpages dynamically can be effective in reducing 
TLB penalties. We reevaluate online superpage policies in the context of 
the impulse memory controller, which supports no-copy superpage 
construction. Our results show that the presence of impulse changes the 
tradeoffs in choosing an appropriate policy, and that the most aggressive 
policy becomes desirable. (4 Refs) 
Subfile: C 
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Author(s): Moon-Seok Chang; Kern Koh; Joon-Won Lee; Hae-Jin Kim 

Journal: Journal of KISS (C) (Computing Practices) vol.5, no. 2 p. 

236-46 

Publisher: Korea Inf. Sci. Soc, 

Publication Date: April 1999 Country of Publication: South Korea 
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Material Identity Number: E347-1999-004 

Language: Korean Document Type: Journal Paper (JP) 

Treatment: Practical (P) 

Abstract: A recent trend in operating system development is microkernel 
design. Microkernels require- a .newly . designed ..tool ^f or .per formance analysis- 
and tuning, since they have a different structure compared to the 
monolithic ones. In this paper, we present MKperf, a performance tool for a 
microkernel-based operating system. MKperf has the capability to monitor 

context switches and remote procedure calls, which are important 
activities for the performance of a microkernel. In addition, this tool 
monitors various hardware events that are critical to the performance of 
memory systems, including caches and TLBs { translation lookaside 

buffers ) . As a result, MKperf provides useful information for the 
performance analysis of microkernel-based operating systems. (16 Refs) 
Subfile: C 
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Journal: Proceedings of the National Science Council, Republic of China, 
Part A (Physical Science and Engineering) vol.22, no . 5 p. 616-25 

Publisher: Natl. Sci. Council, Taiwan, 

Publication Date: Sept. 1998 Country of Publication: Taiwan 
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Abstract: In this paper, we present a new address translation and memory 
protection model to manage the wide 64-bit virtual address space, called 
the segment-based translation and protection (SBTP) model. It partitions a 
64-bit virtual address space into 2/sup 32/ segments with equal size of 
2/sup 32/ bytes. The SBTP model maintains a segment table to record used 
segments for each process. As a result of caching the per-process basis 
segment table on a designed memory cache, called the segment look-aside 
buffer (SLB), the virtual address translation time and protection rights 
verification time can be reduced. Furthermore, by separating the hardware 
mechanisms of address translation and protection, mapping information 
stored in the translation look - aside buffer (TLB) can be shared by 
all the processes and need not be flushed on each context switch. Thus, 
the cost of context switching compared with that conventional 
architectures is greatly reduced. Simulation results show that the proposed 
memory architecture effectively improves the performance of wide virtual 
address translation and memory protection for single address space 
operating systems. (17 Refs) 
Subfile: C 
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Conference Title: Proceedings. 25th Annual International Symposium on 
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ISBN: 0 8186 8491 7 Material Identity Number: XX98-01756 

U.S. Copyright Clearance Center Code : 1063-6897/98/$10 . 00 ^ 

Conference Title; Proceedings of "isca' 98: International Symposium "on 
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Comissio Interdept. Recerca i Innovacio Tecnol . (CIRIT) ; Univ. Politech. 
Catalunya (UPC) 

Conference Date: 27 June-1 July 1998 Conference Location: Barcelona, 
Spain 

Language: English Document- Type : Conference Paper. (PA). . . . . . 

Treatment: Applications (A); Practical (P) 

Abstract: In modern processors, the dynamic translation of virtual 
addresses to support virtual memory is done before or in parallel with the 
first-level cache access. As processor technology improves at a rapid pace 
and the working sets of new applications grow insatiably the latency and 
bandwidth demands on the TLB { Translation Lookaside Buffer ) are 
getting more and more difficult to meet. The situation is worse in 
multiprocessor systems, which run larger applications and are plagued by 
the TLB consistency problem. We evaluate and compare five options for 
virtual address translation in the context of COMAs (Cache Only Memory 
Architectures) . The dynamic address translation mechanism can be located 
after the cache access provided the cache is virtual. In a particular 
design, which we call V-COMA for Virtual COMA, the physical address concept 
and the traditional TLB are eliminated. While still supporting virtual 
memory, V-COMA reduces the address translation overhead to a minimum. 
V-COMA scales well and works better in systems with large number of 
processors. As a machine running on virtual addresses, V-COMA provides a 
simple and consistent hardware model to the operating system and the 
compiler, in which further optimization opportunities are possible. (33 
. Refs) ... - ............ . . . . . . , 

Subfile: C 
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Language: English Document Type: Journal Paper (JP) 
Treatment: Practical (P) 

Abstract: The TLB ( Translation Lookaside Buffer ) miss services have 
been concealed from operating systems, but some new RISC architectures 
manage the TLB in software. Since software-managed TLBs provide flexibility 
to an operating system in page translation, they are considered an 
important factor in the design of microprocessors for open system 
environments. However, software-managed TLBs suffer from larger miss 
penalty than hardware-managed TLBs, since they require more extra context 
switching overhead than hardware-managed TLBs. This paper introduces a new 
technique for reducing the miss penalty of software-managed TLBs by 
prefetching necessary TLB entries before being used. This technique is not 
inherently limited to specific applications. The key of this scheme is to 
perform the prefetch operations to update the TLB entries before first 
accesses so that TLB misses can be avoided. Using trace-driven simulation 
and a quantitative analysis, the proposed scheme is evaluated in terms of 
the miss- rate- and the total miss., penalty. . Qur results show that the- 
proposed scheme reduces the TLB miss rate by a factor of 6% to 77% due to 
TLB characteristics and page sizes. In addition, it is found that reducing 



the miss rate by the prefetching scheme reduces the total miss penalty and 
bus traffics in software-managed TLBs. (21 Refs) 
Subfile: C 
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Language: English Document Type: Conference Paper (PA) 
Treatment: Practical (P) 

Abstract: The Motorola 88110 symmetric superscalar microprocessor's 
memory system and bus interface unit is composed of an 8 KByte instruction 
cache, an 8 KByte data cache, independent instruction and data translation 

lookaside buffers , and a bus interface unit that will handle split bus 
transactions. The memory system and bus interface have been optimized for 
high hit rates, fast context switching, reduced bus traffic, and high 
bandwidth. The 88110 also provides flexibility in address translations and 
hardware multiprocessor features. (2 Refs) 
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Treatment: Practical (P) 

Abstract: The slotted virtual memory management scheme which enhances 
performance of inter-object communication within the same machine in 
operating- systems is proposed.- It divides a virtual -space- into equal sized- 
fragments, called slots, and gives each object one or more slots. It 
enables the following techniques; context -switches and flushes of both 
translation lookaside buffer (TLB) entries and cache entries, become 
unnecessary, by placing several objects into the same space; frequencies of 
communication between virtual spaces can be reduced by moving objects 
dynamically; the execution of extra paths can be avoided by replacing them 
with local procedure calls. Measures on the Apertos operating system are 
carried out. (15 Refs) 
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(Fast inter-object communication using slotted virtual space) 
Mitsuzawa, A; Yokote, Y; Tokoro, M 
Dept. of Comput . Sci., Keio Univ., Tokyo, Japan 

Transactions of Information Processing Society of Japan, v34, n5, 
pp994-1009, 1993 

Document type: journal article Language: Japanese 
Record type: Abstract 

ABSTRACT: 

The slotted virtual memory management scheme which enhances performance of 
inter-object communication within the same machine in operating systems is 
proposed. It divides a virtual space into equal sized fragments, called 
slots, and gives each object one or more slots. It enables the following 
techniques: context -switches -and flushes of -both v translation lookaside- 
buffer (TLB) entries and cache entries, become unnecessary, by placing 
several objects into the same space; frequencies of communication between 
virtual spaces can be reduced by moving objects dynamically; the execution 
of extra paths can be avoided by replacing them with local procedure calls. 
Measures on the Apertos operating system are carried out. 

DESCRIPTORS: BUFFER STORAGE; OPERATING SYSTEM — COMPUTERS; VIRTUAL MEMORY; 
DATA COMMUNICATION; OBJECT ORIENTED PROGRAMMING; MEMORY MANAGEMENT; CACHE 
MEMORIES 

IDENTIFIERS: OPERATING SYSTEMS; FAST INTER OBJECT COMMUNICATION; SLOTTED 
VIRTUAL SPACE; SLOTTED VIRTUAL MEMORY MANAGEMENT SCHEME; CONTEXT SWITCHES 
; FLUSHES; TRANSIiATION LOOKASIDE BUFFER ; CACHE ENTRIES; TLB; LOCAL 
PROCEDURE CALLS; APERTOS OPERATING SYSTEM; Betriebssys tern; 
Ob jekt kommunikation 



Set Items Description 

51 380624 (TRANSLATION OR TABLE) () (LOOKASIDE OR LOOK () ASIDE) () BUFFER? 

OR TLB OR MAP OR MAPPING ^OR MAPS OR MAPPED 

52 960707 ^ CONTEXT OR CURRENT () STATUS OR 'CONDITION OR MODE 

53 2595734 VALID? OR AUTHENTICAT? OR VERIF? OR CERTIF? OR IDENTIF? 

54 13822 65 MEMORY OR STORAGE OR CACHE? OR BUFFER? 

55 1120 31 (S) S2 (S) S3 

56 6357 S2 (3N) S4 

57 253 SI (S) S6 

58 4064 S3 (2N) (FLAG? OR INDICATOR? OR POINTER?) 

59 0 S7 (S) S8 

510 17 S7 (S) S3 

511 0 S5 (S) (DEMAPPING OR DEMAP OR DEMAPS OR DEMAPPED) 

512 16 SIO NOT PY>2001 

513 15 S12 NOT PD>20011009 
314 13 RD (unique items) 

515 650 TRANSLATION 0 (LOOKASIDE OR LOOK() ASIDE) () BUFFER? 

516 27 S15 (S) S2 

517 1 S16 (S) S3 

518 27 S16 OR S17 

519 22 S18 NOT PY>2001 

520 22 S19 NOT PD>20011009 

521 20 RD (unique items) 

522 20 "S21 NOT S14 
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A software-controlled prefetching mechanism for software -managed TLBs 

Park, Jang Suk; Ahn, Gwang Seon 

Microprocessing & Microprograinming v41n2 PP: 121-136 May 1995 
ISSN: 0165-6074 JRNL CODE: EUJ 

ABSTRACT: The translation lookaside buffer (TLB) miss services have 
been cohcealed'^ from operatirl'g syst^his, ' but soifte n^w RISC architectures" 
manage the TLB. . . 

software-managed TLBs suffer from larger miss penalty than 
hardware-managed TLBs, since they require more extra context switching 
overhead than hardware-managed TLBs. A new technique is introduced for 
reducing the miss penalty of. . . 
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ISSN: 0001-0782 JRNL CODE: ACM 
WORD COUNT: 7 830 

...TEXT: ' includes a Processor States (PSy wofd/ Kernel and User stack" 
pointers, a Process Control Block base for context switching, a 
Process-unique value for threads, and a processor number for multiprocessor 
dispatching. Additional PALcode states may include floating-point enable 
bit, interrupt priority level, and translation lookaside buffers for 
mapping instruction-stream and data-stream virtual addresses. All this 
state is soft, in the sense... 
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cache and 2-kbyte data cache; a single-cycle 32-bit multiplier; a 
16-entry MMU with translation lookaside buffer , and support for four 
banks of ROM with 8-, 16- and 32-bit interfaces, as well as... 

...first member of the 29000 family to offer four-channel DMA support, 
with a fly-by operation mode to allow speeds in excess of 2,000 
Mbytes/second in the DMA channel. 
For the low. . . 
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TEXT: 

...a cache for the desired instruction. On a slow path to main memory 
is a large main-^ translation- -lookaside buffer -(TLB) that holds address 
translations. On a fast path is a smaller translation write buffer (TWB) , a 

...the contents of the cache for a hit. The guess access is allowed to 
proceed upon the condition that there is a hit in the TWB (the TWB is 
able to translate the logical address... 

...a physical address) and a miss in the 1-cache. The guess access is 
canceled upon the condition that there is either a miss in the TWB (the 
TWB is unable to translate the logical... 
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HP showing architecture for 64-bit PA-RISC MPU. (HP developing PA-8000 
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. . . fetches up to four quadword-aligned instructions per cycle, a large 

(56-entry) reorder buffer, branch prediction mode , an address reorder 
buffer (ARB) to the dual-ported/single-level off-chip data cache, 96-entry 
translation lookaside buffer (TLB), and support for up to eight-way 
multiprocessing without any external glue logic. 
In what could . . . 
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translation from 32-bit virtual address to the 36-bit physical 
address called for by MBus . A context register is used to identify up 
to 65,000 contexts. A translation look - aside ' buffer (TLB) completes 
address translation and improves the MMU ' s translation performance. The TLB 
supports 64 entries as... 
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... hence in realtime systems. Determinism and throughput may be 

adversely affected by table walks, page faults and context switches. 
Translation look - aside buffers that are lockable can restore 
determinism and performance. 

* Pipelines and instruction units. Depending on the implementation^, a 
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... a program, the initialization code sets up the system resources 

(control registers, general registers, floating-point registers, 
translation lookaside buffers or TLBs, etc.) to certain states. The 
initialization code runs in real mode . It turns on the virtual mode 
before execution switches to the random code. The necessary setup for 
virtual addressing mode includes TLB entries, address queues, the 
processor status word, and so on. The switch from real to virtual mode 
takes place on the last instruction in the initialization code. Once in 
virtual mode , execution of instructions on the random code page continues 
indefinitely until a recovery counter trap occurs. 
BPS . . . 
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... enhanced its forthcoming implementation of IBM's Enterprise Systems 

Architecture with proprietary hardware accelerators for Access Register 
Mode , optimised to reduce the time needed to translate data addresses 



using access registers for access to ESA data spaces. The AS/EX machines 
also use access register translation lookaside buffers to imp rove 
performance, with storage of up to 256 recently-used addresses. Enterprise 
Systems Architecture support... 
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The CLIPPER microprocessor uses caching and virtual memory as the 
standard mode of operation. The associated CAMMU chips each contain a 4 
Kbyte cache, a translation lookaside buffer (TLB), and a translator. 
One CAMMU is used for instruction references and the other for data; the... 
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are 16-bit modes whose only purpose is to provide for backward 
compatibility . 

In "32-bit protected mode the 80386 is a much more powerful 
device than any of its Intel predecessors. At 32 bits... 

...address space. The paging hardware needed to manage these large virtual 
spaces is on-chip, including a translation look - aside buffer that 
caches the most recently used page table entries to achieve added 
performance. As a result of these and other enhancements, the 32-bit mode 
of the 80386 is well able to support UNIX and other demanding operating 
systems . 

The other three. . . 
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on-chip design efforts on making bus cycles efficient, rather than 
on implementing a cache or burst- mode controller. "We designed a 



pipelined address mode for the 80386 that makes address and control 
signals available before the^ end of the preceeding ^bus . . . 

...states, letting the CPU run faster. Also, the chip's prefetch queue and 
the memory manager's translation lookaside buffer tend to cut down on 
bus loading. As a result, about one-half of our customers running. . . 
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...ABSTRACT: Mips. The microprocessor has a MMU, a cache controller, 512 
Kbytes of instruction and data caches, a translation look - aside 
buffer r and a floating point coprocessor interface, the floating point 
coprocessor is the R3010 which supports' IEEE standard..'. 

...buffer allows the processor to do write operations during run cycles. 
All of the chips have a mode pin which, when activated, causes them to 
function like the previous generation R2000 devices. 
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. . , accommodate one or more users demanding multiuser-multitasking 

capability for business and engineering applications. 

In a standalone mode with the WE 32200 microprocessor, the 
memory-management cache unit typically supplies a 99.6% hit rate in the 
descriptor cache, or translation lookaside buffer (TLB) , and 80% to 
85% hit rate in the data cache. The hit rate is the probability... 
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... It can rapidly translate to physical addresses by means of an 

on-chip, 64 -entry fully associative translation look - aside buffer 
that accesses dual 8-kbyte data and instruction caches. 
Along the same lines, MIPS plans to add. . . 
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the cache control register. This chip also has the ability to lock 
page descriptors in the MMU translation look - aside buffer / which 
might otherwise be flushed during a context switch. 
Embedded Application Development Tools 

The success of RISC processors in embedded applications will depend 
greatly on . . . 
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. . . fetches up to four quadword-aligned instructions per cycle, a larg 

{56-entry) reorder buffer, branch prediction mode , an address reorder 
buffer (ARB) to the dual -ported/single-level off-chip data cache, 96-entr 

translation lookaside buffer (TLB), and support for up to eight-way 
multiprocessing without any external glue logic. 
In what could. . . 
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support the MIPS Instruction Set Architecture (ISA) . 
To support efficient embedded applications, HDL started by removing 
the Translation Lookaside Buffer (TLB) . HDL estimates that removing 
the TLB and all co-processor zero registers, which supported the memory. . . 

...embedded applications, HDL also implemented the MR300 as a physical 
address space processor, retaining User- and Kernel- mode memory 
protection and providing access to the full 4-Gbyte address space. 
HDL has extensively redesigned the. . . 
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. . . associative 8-kbyte instruction cache and a 2-kbyte data cache, as 

well as support for burst mode (required in high-performance memory 
systems) . A translation lookaside buffer permits caching of the most 
recent translations. 

The central integer unit uses a five-stage pipeline that... 
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... instruction buses), on-chip branch target chache for caching up to 
32 jump targets, an on-chip translation lookaside buffer to demand 
paged memory scheme, and a register file that can be used as a stack cache 

... the specific instruction of the user. The bus interface unit is capable 
of getting instructions in burst mode so that once the latency of a 
nonsequential fetch is over the processor can once again maintain. . . 



